Any mental technique to quickly deduce the required ifs/elses in a program with A LOT of conditional logic? - conditional-statements

Often in programming, it is a very common requirement that some piece of functionality will require a lot of conditional logic, but not quite enough to warrant a rules engine.
For example, testing a number is divisible by x but also a multiple of something, a factor of something else, a square root of something, etc. As you can imagine, something along these lines will easily involve a lot of ifs/elses.
While it is possible to reduce the clutter with more modern programming techniques, how do you quickly and, in a calculated fashon, deduce the required ifs/elses?
For example, in a program to deduce the necessary quote for a car insurance prospective customer (rules engines aside btw), there would be conditional logic for age, location, driving points, what age those points are collected at, etc. Is there any mental technique to quickly deduce the redundant conditional branches? Is it just plain experience and no special mental technique? This is important because pair programming there is a lot of noise and thus difficult to actually think something through or even get enough time to implement the idea.
Thanks

I would suggest that trying to do this sort of thing in your head is asking for trouble, and trying to do it with a partner is going to make it much worse. Sometimes you have to sit and think and even make some notes on paper. If you don't like propositional logic, try decision tables.

I would add simple, readable, short methods, such as:
IsMinor(..)
IsRecordClean(..)
And then use them in conjunction to create new methods with meaningful names, such as:
IsMeetingPreReqs(..) //which checks several "simple" conditions
IsValidForInsurance(..)
(Sorry for the examples, I'm struggling with my English here, but you get the point..)
IMO that will make your code much clearer, and thus reduce the chances of being confused by distractions.
Not mental per se, but kinda..

I would say you should use propositional logic.
Suggest that:
q = age is greater than 18
p = location is within 10 miles
r = driving points are less than 3
s = age is less than 18 when points are collected
you could say...
(^ is AND)
if (q ^ p ^ r ^ s) {
//you are eligible or something!
} else {
//get outta here
}

Related

Why does class Money extend Expression in Kent Beck's TDD by Example?

I'm studying TDD by Example and up so far I'm finding it a great book. But there's a point where he tells us to write:
// in class Money:
Expression plus(Money addend) {
return new Money(amount + addend.amount, currency);
}
Which won't build unless we declare:
class Money implements Expression {...
This doesn't really make sense to me. Author created Expression as an interface for Sum and Money has nothing in common with Sum. Later on he adds method reduce() in common to both classes, but reduce in Money merely returns this.
Making Money implement Expression is only the path of least effort for removing the error from the plus() method, but it fills the code with unnecessary information (it will have to implement reduce() because of this decision) and increases entropy.
I haven't given it much of a thought, but wouldn't it be cleaner to do something such as this?
class Money {
Money plus(Money addend) {
return new Money(amount + addend.amount, currency).reduce();
}
}
// edited this, it previously returned Expression
Edit:
In the following chapter, author implements a reduce() method in another class (named Bank), which converts Money in between currencies. I still find this to be a weird solution, Sum and Expression names imply we should have a conversion expression class for this task instead. What the author is likely intending to do is to use recursion to add money in different currencies. Either way, it seems to me that he did some far-ahead planning which seems incompatible with TDD as is presented in the book.
On planning ahead
TDD doesn't prohibit planning ahead. The process is about getting rapid feedback on your plans instead of wasting days (or weeks) creating elaborate plans, only to see them 'not survive contact with reality' (to paraphrase Helmuth von Moltke). It's okay to think ahead.
Still, Kent Beck reveals in chapter 17 that this isn't his first rodeo:
"I have programmed money in production at least three times that I can think of. I have used it as an example in print another half-dozen times. I have programmed it live on stage [...] another fifteen times. I coded another three or four times preparing for writing [...] Then, while I was writing this, I thought of using expression as a metaphor and the design went in a completely different direction than before."
So, if you think that he's cheating: yes, he is. He's open about it, though. I think that the motivation was to present a compelling example. He also writes that it was in part based on early reviews of the book.
On the API
That doesn't explain why the code looks like it does, but there's a reason for that. It's actually a nice API.
Why can't you write something like return new Expression(amount + addend.amount, currency).reduce()?
You can't because the reduce method isn't nullary. It takes arguments. You must supply both a bank (which holds the currency conversion rates) and a destination currency.
Keep in mind the problem being solved by the code. People get this wrong all the time, and I think that Kent Beck (inadvertently) fuelled the confusion by naming the example Money.
The problem isn't to model money, but to model investment portfolios in different currencies. If you have a portfolio of 25.000 USD and 10.000 CHF, reducing it to a single currency hides important details. With a portfolio in multiple currencies you diversify risk. Portfolio owners will want to see more than one view of their portfolios. Sometimes, they want to see the portfolio grouped by currency, and at other times they want to see 'the current total worth' of the portfolio presented in a single currency.
The API in the book enables both views.
The reason the underlying 'metaphor' is called an expression is that the API is just a specialised expression tree. It turns out fairly well, though, because it's lawful. Restricted to the sub-types presented in the book, it informally gives rise to a monoid.

Robust Measures of Algorithmic Trading - Based on Robert Pardo's Book

I am optimizing algorithmic strategies. In the process of choosing from a pool of many optimized strategies, I am in the phase of searching (evaluating) for robustness of the strategy.
Following the guidelines of Dr. Pardo's book "The Evaluation of Trading Strategies" in page 231 Dr. Pardo recomends, in the Numeral 3 to apply the following ratio to the optimized data:
" 3. The ratio of the total profit of all profitable simulations divided by the
total profit of all simulationsis significantly positive"
The Question: from the optimization results, I am not being able to properly understand what does Mr. Pardo means by stating "...all simulationsis significantly positive"; what does Mr. Pardo means by 'significantly positive?
a.) with 95% confidence level?
b.) with a certain p value?
c.) the relation of the average net profit of each simulation minus it' standard deviation
Even though the sentence might seem 'simple' I would REALLY like to understand what Mr. Pardo means by the statement and HOW to calculate it, in order to filter the most robust algorithmic strategies.
The aim of analyzing the optimization profile of an algorithmic simulation is to be able to filter robust strategies.
Therefore the ratio should help us to uncover if the simulation results are on the right track or not.
So, we would like to impose some 'penalties' to our results, so we can select the robust cases from those of doubtful (not robust) result.
I came to the following penalizing measures (found in the book of Mr. Pardo and other sources).
a.) we can use a market return (yearly value) as a benchmark, so all the simulations whose result are below such level, can be excluded from our analysis,
b.) some other benchmark to divide those 'robust' results from those more 'doubtful' (for example, deducing to each result one standard deviation)
From (a) and (b), we can create the ratio:
the total sum of all profitable simulations divided by the profitable results considered robust
The ratio should be greater or equal than 1.
If the ratio is equal to 1 then it means that our simulation result has given interesting results (we are analyzing the positive values in this ratio, but profitable results should always be compared to the negative results also).
If the ratio is greater from 1, then we have not reach the possible scenario, and the result should be compared with the other tests for optimizations.
While simulating trading algorithms, no result is absolute but partial and it's value is taken in relationship to what we expect from the algorithm.
If someone has a better explanation or idea or concept you might find interesting please share, I would gladly read it.
Best regards to all.
Remark on the subject
With all due respect to the subject ( published in 2008 ) the term robustness has its own meaning if-and-only-if the statement also clarifies in which particular respect is the robustness measured and against what phenomena is it to be exposed & tested the Model-under-review's response ( against what perturbances -- type and scale -- shall the Model-under-test hold its robust behaviour, measures of which were both defined and quantified a-priori the test ).
In any case, where such context of the robustness is not defined, the material, be it printed by any bold name, sounds -- and forgive me to speak in plain English -- just like a PR-story, an over-hyped e-zine headline or like a paid advertorial.
Serious quantitative model evaluations, the more if one strives to perform an optimisation ( with respect to some defined quantitative goal ), requires a more thorough insight into the subject than to axiomatically post a trivial "must-have" imperative of
large-average && small-HiLo-range && small StDev.
Any serious Quant-Modelling effort, if it were not to just spoil the consumed hundreds-of-thousands CPU core hours of deep parametric-spaces' scans, shall incorporate a serious parametrisation decision in either dimension of the main TruTrading Strategy sub-spaces --
{ aSelectPOLICY, aDetectPOLICY, anActPOLICY, anAllocatePOLICY, aTerminatePOLICY }
A failure to do so, either cripples the model or leads to a blind-belief, where it is hard to guess, whether the former or the latter is a greater of the both Quant-sins.
Remark on the cited hypothesis
The book states, without any effort to proof the construction, that:
The more robust trading strategywill have an optimization profile with a: 1. Largeaverageprofit 2. Small maximum-minimumrange3. Small standarddeviation
Is it correct?
Now kindly spend a few moments and review this 4D-animated view of a Model-under-test ( visualisation of which is reduced into just four dimensions for easier visual perception ), where none of the above stands true.
<aMouseRightCLICK>.openPictureOnAnotherTab to see full HiRes picture details
Based on contemporary state-of-art adaptive money-management practice, that fails to be correct, be it due to a poor parametrisation ( thus artificially leading the model into a rather "flat-profits" sub-space of aParamSetVectorSPACE )
or due to a principal mis-concept or a poor practice ( including the lack thereof ) of the implementation of the most powerful profit-booster ever -- the very money-management model sub-space.
Item 1 becomes insignificant at all.
Item 2 works right on the contrary to the stated postulate.
Item 3 cannot yield anything but the opposite due to 1 & 2 above.

Cplex/OPL local search

I have a model implemented in OPL. I want to use this model to implement a local search in java. I want to initialize solutions with some heuristics and give these initial solutions to cplex find a better solution based on the model, but also I want to limit the search to a specific neighborhood. Any idea about how to do it?
Also, how can I limit the range of all variables? And what's the best: implement these heuristics and local search in own opl or in java or even C++?
Thanks in advance!
Just to add some related observations:
Re Ram's point 3: We have had a lot of success with approach b. In particular it is simple to add constraints to fix the some of the variables to values from a known solution, and then re-solve for the rest of the variables in the problem. More generally, you can add constraints to limit the values to be similar to a previous solution, like:
var >= previousValue - 1
var <= previousValue + 2
This is no use for binary variables of course, but for general integer or continuous variables can work well. This approach can be generalised for collections of variables:
sum(i in indexSet) var[i] >= (sum(i in indexSet) value[i])) - 2
sum(i in indexSet) var[i] <= (sum(i in indexSet) value[i])) + 2
This can work well for sets of binary variables. For an array of 100 binary variables of which maybe 10 had the value 1, we would be looking for a solution where at least 8 have the value 1, but not more than 12. Another variant is to limit something like the Hamming distance (assume that the vars are all binary here):
dvar int changed[indexSet] in 0..1;
forall(i in indexSet)
if (previousValue[i] <= 0.5)
changed[i] == (var[i] >= 0.5) // was zero before
else
changed[i] == (var[i] <= 0.5) // was one before
sum(i in indexSet) changed[i] <= 2;
Here we would be saying that out of an array of e.g. 100 binary variables, only a maximum of two would be allowed to have a different value from the previous solution.
Of course you can combine these ideas. For example, add simple constraints to fix a large part of the problem to previous values, while leaving some other variables to be re-solved, and then add constraints on some of the remaining free variables to limit the new solution to be near to the previous one. You will notice of course that these schemes get more complex to implement and maintain as we try to be more clever.
To make the local search work well you will need to think carefully about how you construct your local neighbourhoods - too small and there will be too little opportunity to make the improvements you seek, while if they are too large they take too long to solve, so you don't get to make so many improvement steps.
A related point is that each neighbourhood needs to be reasonably internally connected. We have done some experiments where we fixed the values of maybe 99% of the variables in a model and solved for the remaining 1%. When the 1% was clustered together in the model (e.g. all the allocation variables for a subset of resources) we got good results, while in comparison we got nowhere by just choosing 1% of the variables at random from anywhere in the model.
An often overlooked idea is to invert these same limits on the model, as a way of forcing some changes into the solution to achieve a degree of diversification. So you could add a constraint to force a specific value to be different from a previous solution, or ensure that at least two out of an array of 100 binary variables have a different value from the previous solution. We have used this approach to get a sort-of tabu search with a hybrid matheuristic model.
Finally, we have mainly done this in C++ and C#, but it would work perfectly well from Java. Not tried it much from OPL, but it should be fine too. The key for us was being able to traverse the problem structure and use problem knowledge to choose the sets of variables we freeze or relax - we just found that easier and faster to code in a language like C#, but then the modelling stuff is more difficult to write and maintain. We are maybe a bit "old-school" and like to have detailed fine-grained control of what we are doing, and find we need to create many more arrays and index sets in OPL to achieve what we want, while we can achieve the same effect with more intelligent loops etc without creating so many data structures in a language like C#.
Those are several questions. So here are some pointers and suggestions:
In Cplex, you give your model an initial solution with the use of IloOplCplexVectors()
Here's a good example in IBM's documentation of how to alter CPLEX's solution.
Within OPL, you can do the same. You basically set a series of values for your variables, and hand those over to CPLEX. (See this example.)
Limiting the search to a specific neighborhood: There is no easy way to respond without knowing the details. But there are two ways that people do this:
a. change the objective to favor that 'neighborhood' and make other areas unattractive.
b. Add constraints that weed out other neighborhoods from the search space.
Regarding limiting the range of variables in OPL, you can do it directly:
dvar int supply in minQty..maxQty;
Or for a whole array of decision variables, you can do something along the lines of:
range CreditsAllowed = 3..12;
dvar int credits[student] in CreditsAllowed;
Hope this helps you move forward.

Building ranking with genetic algorithm,

Question after BIG edition :
I need to built a ranking using genetic algorithm, I have data like this :
P(a>b)=0.9
P(b>c)=0.7
P(c>d)=0.8
P(b>d)=0.3
now, lets interpret a,b,c,d as names of football teams, and P(x>y) is probability that x wins with y. We want to build ranking of teams, we lack some observations P(a>d),P(a>c) are missing due to lack of matches between a vs d and a vs c.
Goal is to find ordering of team names, which the best describes current situation in that four team league.
If we have only 4 teams than solution is straightforward, first we compute probabilities for all 4!=24 orderings of four teams, while ignoring missing values we have :
P(abcd)=P(a>b)P(b>c)P(c>d)P(b>d)
P(abdc)=P(a>b)P(b>c)(1-P(c>d))P(b>d)
...
P(dcba)=(1-P(a>b))(1-P(b>c))(1-P(c>d))(1-P(b>d))
and we choose the ranking with highest probability. I don't want to use any other fitness function.
My question :
As numbers of permutations of n elements is n! calculation of probabilities for all
orderings is impossible for large n (my n is about 40). I want to use genetic algorithm for that problem.
Mutation operator is simple switching of places of two (or more) elements of ranking.
But how to make crossover of two orderings ?
Could P(abcd) be interpreted as cost function of path 'abcd' in assymetric TSP problem but cost of travelling from x to y is different than cost of travelling from y to x, P(x>y)=1-P(y<x) ? There are so many crossover operators for TSP problem, but I think I have to design my own crossover operator, because my problem is slightly different from TSP. Do you have any ideas for solution or frame for conceptual analysis ?
The easiest way, on conceptual and implementation level, is to use crossover operator which make exchange of suborderings between two solutions :
CrossOver(ABcD,AcDB) = AcBD
for random subset of elements (in this case 'a,b,d' in capital letters) we copy and paste first subordering - sequence of elements 'a,b,d' to second ordering.
Edition : asymetric TSP could be turned into symmetric TSP, but with forbidden suborderings, which make GA approach unsuitable.
It's definitely an interesting problem, and it seems most of the answers and comments have focused on the semantic aspects of the problem (i.e., the meaning of the fitness function, etc.).
I'll chip in some information about the syntactic elements -- how do you do crossover and/or mutation in ways that make sense. Obviously, as you noted with the parallel to the TSP, you have a permutation problem. So if you want to use a GA, the natural representation of candidate solutions is simply an ordered list of your points, careful to avoid repitition -- that is, a permutation.
TSP is one such permutation problem, and there are a number of crossover operators (e.g., Edge Assembly Crossover) that you can take from TSP algorithms and use directly. However, I think you'll have problems with that approach. Basically, the problem is this: in TSP, the important quality of solutions is adjacency. That is, abcd has the same fitness as cdab, because it's the same tour, just starting and ending at a different city. In your example, absolute position is much more important that this notion of relative position. abcd means in a sense that a is the best point -- it's important that it came first in the list.
The key thing you have to do to get an effective crossover operator is to account for what the properties are in the parents that make them good, and try to extract and combine exactly those properties. Nick Radcliffe called this "respectful recombination" (note that paper is quite old, and the theory is now understood a bit differently, but the principle is sound). Taking a TSP-designed operator and applying it to your problem will end up producing offspring that try to conserve irrelevant information from the parents.
You ideally need an operator that attempts to preserve absolute position in the string. The best one I know of offhand is known as Cycle Crossover (CX). I'm missing a good reference off the top of my head, but I can point you to some code where I implemented it as part of my graduate work. The basic idea of CX is fairly complicated to describe, and much easier to see in action. Take the following two points:
abcdefgh
cfhgedba
Pick a starting point in parent 1 at random. For simplicity, I'll just start at position 0 with the "a".
Now drop straight down into parent 2, and observe the value there (in this case, "c").
Now search for "c" in parent 1. We find it at position 2.
Now drop straight down again, and observe the "h" in parent 2, position 2.
Again, search for this "h" in parent 1, found at position 7.
Drop straight down and observe the "a" in parent 2.
At this point note that if we search for "a" in parent one, we reach a position where we've already been. Continuing past that will just cycle. In fact, we call the sequence of positions we visited (0, 2, 7) a "cycle". Note that we can simply exchange the values at these positions between the parents as a group and both parents will retain the permutation property, because we have the same three values at each position in the cycle for both parents, just in different orders.
Make the swap of the positions included in the cycle.
Note that this is only one cycle. You then repeat this process starting from a new (unvisited) position each time until all positions have been included in a cycle. After the one iteration described in the above steps, you get the following strings (where an "X" denotes a position in the cycle where the values were swapped between the parents.
cbhdefga
afcgedbh
X X X
Just keep finding and swapping cycles until you're done.
The code I linked from my github account is going to be tightly bound to my own metaheuristics framework, but I think it's a reasonably easy task to pull the basic algorithm out from the code and adapt it for your own system.
Note that you can potentially gain quite a lot from doing something more customized to your particular domain. I think something like CX will make a better black box algorithm than something based on a TSP operator, but black boxes are usually a last resort. Other people's suggestions might lead you to a better overall algorithm.
I've worked on a somewhat similar ranking problem and followed a technique similar to what I describe below. Does this work for you:
Assume the unknown value of an object diverges from your estimate via some distribution, say, the normal distribution. Interpret your ranking statements such as a > b, 0.9 as the statement "The value a lies at the 90% percentile of the distribution centered on b".
For every statement:
def realArrival = calculate a's location on a distribution centered on b
def arrivalGap = | realArrival - expectedArrival |
def fitness = Σ arrivalGap
Fitness function is MIN(fitness)
FWIW, my problem was actually a bin-packing problem, where the equivalent of your "rank" statements were user-provided rankings (1, 2, 3, etc.). So not quite TSP, but NP-Hard. OTOH, bin-packing has a pseudo-polynomial solution proportional to accepted error, which is what I eventually used. I'm not quite sure that would work with your probabilistic ranking statements.
What an interesting problem! If I understand it, what you're really asking is:
"Given a weighted, directed graph, with each edge-weight in the graph representing the probability that the arc is drawn in the correct direction, return the complete sequence of nodes with maximum probability of being a topological sort of the graph."
So if your graph has N edges, there are 2^N graphs of varying likelihood, with some orderings appearing in more than one graph.
I don't know if this will help (very brief Google searches did not enlighten me, but maybe you'll have more success with more perseverance) but my thoughts are that looking for "topological sort" in conjunction with any of "probabilistic", "random", "noise," or "error" (because the edge weights can be considered as a reliability factor) might be helpful.
I strongly question your assertion, in your example, that P(a>c) is not needed, though. You know your application space best, but it seems to me that specifying P(a>c) = 0.99 will give a different fitness for f(abc) than specifying P(a>c) = 0.01.
You might want to throw in "Bayesian" as well, since you might be able to start to infer values for (in your example) P(a>c) given your conditions and hypothetical solutions. The problem is, "topological sort" and "bayesian" is going to give you a whole bunch of hits related to markov chains and markov decision problems, which may or may not be helpful.

Grammatically correct double-noun identifiers, plural versions

Consider compounds of two nouns, which in natural English would most often appear in the form "noun of noun", e.g. "direction of light", "output of a filter". When programming, we usually write "LightDirection" and "FilterOutput".
Now, I have a problem with plural nouns. There are two cases:
1) singular of plural
e.g. "union of (two) sets", "intersection of (two) segments"
Which is correct, SetUnion and SegmentIntersection or SetsUnion and SegmentsIntersection?
2) plural of plural
There are two subcases:
(a) Many elements, each having many related elements, e.g. "outputs of filters"
(b) Many elements, each having single related element, e.g. "directions of vectors"
Shall I use FilterOutputs and VectorDirections or FiltersOutputs and VectorsDirections?
I suspect correct is the first version (FilterOutupts, VectorDirections), but I think it may lead to ambiguities, e.g.
FilterOutputs - many outputs of a single filter or many outputs of many filters?
LineSegmentProjections - projections of many segments or many projections of a single segment?
What are the general rules, I should follow?
There's a grammatical misunderstanding lying behind this question. When we turn a phrase of form:
1. X of Y
into
2. Y X
the Y changes grammatical role from a noun in the possessive (1) to an adjective in the attributive (2). So while one may pluralise both X and Y in (1), one may only pluralise X in (2), because Y in (2) is an adjective, and adjectives do not have grammatical number.
Hence, e.g., SetsUnion is not in accordance with English. You're free to use it if it suits you, but you are courting unreadability, and I advise against it.
Postscript
In particular, consider two other possessive constructions, first the old-fashioned construction using the possessive pronoun "its", singular:
3a. Y, its X
the equivalent plural:
4a. Ys, their X
and their contractions, with 4b much less common than 3b:
3b. Y's X
4b. Ys' X
Here, SetsUnion suggests it is a rendering of the singular possessive type (3) Set's Union (=Set, its Union), where you intended to communicate the plural possessive (4) Sets, their Union (contracted to the less common Sets' Union).
So it's actively misleading.
Unless you're getting hamstrung by a convention driven system (ruby on rails, cakePHP etc), why not use OutputsOfFilters, UnionOfSets etc? They may not be conventional but they may be clearer.
For example its pretty clear that ProjectionOfLineSegments and ProjectionsOfLineSegment are different things or even ProjectionsOfLineSegments....
Using plural forms of nouns can make them more difficult to read.
When you have a number of things, they are usually stored in a datastructure - an array, a list, a map, set, etc.. generically called a collection or abstract data type. The interface to a collection of items is typically part of the programming environment (e.g. Collections in java and .net, STL in C++) and is well understood by developers to involve quantities of items.
You can avoid pluralizing your nouns, and make the fact that you are dealing with multiple quantities explicit, and indicate how they are accessed by incorporating the name of the collection. For example,
VectorDirectionList - the vectors and their directions are listed, e.g. some kind of Pair type. Works particularly well if you have a VectorDirection, combining a Vector and a Direction.
VectorDirectionMap - if the vector directions are mapped from vector.
Because it's a collection type, dealing with multiple objects is understood as it is endemic to a collection type. It then puts it in the same class as SetUnion - a union always involves at least 2 sets, and a VectorDirectionList makes it clear there can be more than one VectorDirection.
I agree about avoiding homonyms where the word has more than one word class, e.g. Filter, (and actually, Set, although to my mind Set would not really be used in a class name as a verb, so I interpret it as a noun.) I originally wrote this using FilterOutput as an example, but it didn't read well. Using a compound for Filter may help disambiguate - e.g. ImageFilterOutputs (or applying my own adivce, this would be ImageFilterOutputList.)
Avoiding plural forms with class names seems natural when you consider that an instance of a class is itself always one item - "an instance". If we use a plural name, then we get a mismatch - an instance trying to imply that it is multiple things - it itself is just one thing, even if it references multiple other things. The collection naming above builds on this - you have an instance which is a list, a map etc so there is no mismatch.
I'm assuming you are talking about programming language constructs, although the same thinking applies to tables/views. These are understood to involve quantities of items and table names are consequently often singlular (Customer, Order, Item) even though they store multiple rows. Many-to-Many Mapping tables are usually compounds of the entities being related, e.g. relating orders to items - OrderItem. In my experience, using plurals for table names makes the SQL difficult to read.
To sum up, I would avoid plural froms as they make reading harder. There are sure to be cases where they are unavoidable - where using the plural form is more readable than creating a huge name of nested entities and collections, but these are the exception than the rule.
What are the general rules, I should follow?
Make it Clear -- for both visual and aural thinkers.
Make it Specific but Accurate.
Make it pass the "crowded room" or "emergency phone call" test.
To illustrate with the SetsUnion example:
"SetsUnion" is right out; It's easily confused for a typo and speaking it (even in your head) will confuse it for "Set's Union" (Or worse).
The plural is also implied, so the 2nd 's' is redundant.
SetUnion is better but still ambiguous.
UnionOfSets is clearer and should be the bare minimum standard.
But all of these, so far, are uselessly vague (unless you are working with pure mathematical theory).
The term really should be specific. For example, "Red cars", "Programmers who spent too much time on esoterica", etc.
These are all unions of sets, but they tell you something useful. ;-)
.
Finally, Phil Factor had the right of it. To paraphrase:
Can you shout a (term) out across a crowded room and have it keyed in, and successfully (used), by a listener at the other side?
Try yelling, "SetsUnion," or even, "UnionOfSets," across a packed Irish bar. ;-)
1) i would use SetUnion and SegmentIntersection because i think in this case the plurality is implied anyway and it just looks nicer that way.
2) again, i would use FilterOutputs and VectorDirections, for the same reason. you could always use MultipleFilterOutputs if you want to be more specific.
but ultimately it's entirely down to your personal preference.
I think that while general naming conventions and consistency are important, but in a very very tight/tricky algorithm, clarity should trump convention. If it helps, use veryLongAndDescriptiveIdentifiers.
What's wrong with Union()?
Moreover, "union of sets" turns into "sets' union" (the two sets' union is ...); I'm sure I'm not the only person who's okay with CamelCase but not CamelsCaseMinusApostrophes. If it needs an apostrophe to make sense, don't use it. Set.Union() reads exactly like "union of set(s)".
Mathematations will also say "the (set) union of A and B", or rarely "A and B's (set) union". "The sets' union of A and B" makes no sense!
Most people will also see Vector[] vectors and Directions[] vectorDirections and assume that vectors[i] corresponds to vectorDirections[i]. If things really get ambiguous, I use something like vector_by_index and vectorDirection_by_index. Then you can have Map<Filter,Output> output_by_filter or Map<Filter,Output[]> outputs_by_filter, which makes it very obvious what the key is (this is very important in Objective-C where it's completely non-obvious what type the keys or values are).
If you really want, you can add an s and get vectors_by_index, but then consistency gives you the silly outputss_by_filter.
The right thing is, of course, something like struct FilterState { Filter filter; Output[] outputs; }; FilterState[] filterStates;.
I'd suggest singular for the first word: SetUnion, VectorDirections, etc.
Do a quick class search in your IDE, for: Strings*, Sets*, Vectors*, Collections*
Anyway, whatever you choose, be consistent throughout the whole application.