Grammatically correct double-noun identifiers, plural versions - naming-conventions

Consider compounds of two nouns, which in natural English would most often appear in the form "noun of noun", e.g. "direction of light", "output of a filter". When programming, we usually write "LightDirection" and "FilterOutput".
Now, I have a problem with plural nouns. There are two cases:
1) singular of plural
e.g. "union of (two) sets", "intersection of (two) segments"
Which is correct, SetUnion and SegmentIntersection or SetsUnion and SegmentsIntersection?
2) plural of plural
There are two subcases:
(a) Many elements, each having many related elements, e.g. "outputs of filters"
(b) Many elements, each having single related element, e.g. "directions of vectors"
Shall I use FilterOutputs and VectorDirections or FiltersOutputs and VectorsDirections?
I suspect correct is the first version (FilterOutupts, VectorDirections), but I think it may lead to ambiguities, e.g.
FilterOutputs - many outputs of a single filter or many outputs of many filters?
LineSegmentProjections - projections of many segments or many projections of a single segment?
What are the general rules, I should follow?

There's a grammatical misunderstanding lying behind this question. When we turn a phrase of form:
1. X of Y
into
2. Y X
the Y changes grammatical role from a noun in the possessive (1) to an adjective in the attributive (2). So while one may pluralise both X and Y in (1), one may only pluralise X in (2), because Y in (2) is an adjective, and adjectives do not have grammatical number.
Hence, e.g., SetsUnion is not in accordance with English. You're free to use it if it suits you, but you are courting unreadability, and I advise against it.
Postscript
In particular, consider two other possessive constructions, first the old-fashioned construction using the possessive pronoun "its", singular:
3a. Y, its X
the equivalent plural:
4a. Ys, their X
and their contractions, with 4b much less common than 3b:
3b. Y's X
4b. Ys' X
Here, SetsUnion suggests it is a rendering of the singular possessive type (3) Set's Union (=Set, its Union), where you intended to communicate the plural possessive (4) Sets, their Union (contracted to the less common Sets' Union).
So it's actively misleading.

Unless you're getting hamstrung by a convention driven system (ruby on rails, cakePHP etc), why not use OutputsOfFilters, UnionOfSets etc? They may not be conventional but they may be clearer.
For example its pretty clear that ProjectionOfLineSegments and ProjectionsOfLineSegment are different things or even ProjectionsOfLineSegments....

Using plural forms of nouns can make them more difficult to read.
When you have a number of things, they are usually stored in a datastructure - an array, a list, a map, set, etc.. generically called a collection or abstract data type. The interface to a collection of items is typically part of the programming environment (e.g. Collections in java and .net, STL in C++) and is well understood by developers to involve quantities of items.
You can avoid pluralizing your nouns, and make the fact that you are dealing with multiple quantities explicit, and indicate how they are accessed by incorporating the name of the collection. For example,
VectorDirectionList - the vectors and their directions are listed, e.g. some kind of Pair type. Works particularly well if you have a VectorDirection, combining a Vector and a Direction.
VectorDirectionMap - if the vector directions are mapped from vector.
Because it's a collection type, dealing with multiple objects is understood as it is endemic to a collection type. It then puts it in the same class as SetUnion - a union always involves at least 2 sets, and a VectorDirectionList makes it clear there can be more than one VectorDirection.
I agree about avoiding homonyms where the word has more than one word class, e.g. Filter, (and actually, Set, although to my mind Set would not really be used in a class name as a verb, so I interpret it as a noun.) I originally wrote this using FilterOutput as an example, but it didn't read well. Using a compound for Filter may help disambiguate - e.g. ImageFilterOutputs (or applying my own adivce, this would be ImageFilterOutputList.)
Avoiding plural forms with class names seems natural when you consider that an instance of a class is itself always one item - "an instance". If we use a plural name, then we get a mismatch - an instance trying to imply that it is multiple things - it itself is just one thing, even if it references multiple other things. The collection naming above builds on this - you have an instance which is a list, a map etc so there is no mismatch.
I'm assuming you are talking about programming language constructs, although the same thinking applies to tables/views. These are understood to involve quantities of items and table names are consequently often singlular (Customer, Order, Item) even though they store multiple rows. Many-to-Many Mapping tables are usually compounds of the entities being related, e.g. relating orders to items - OrderItem. In my experience, using plurals for table names makes the SQL difficult to read.
To sum up, I would avoid plural froms as they make reading harder. There are sure to be cases where they are unavoidable - where using the plural form is more readable than creating a huge name of nested entities and collections, but these are the exception than the rule.

What are the general rules, I should follow?
Make it Clear -- for both visual and aural thinkers.
Make it Specific but Accurate.
Make it pass the "crowded room" or "emergency phone call" test.
To illustrate with the SetsUnion example:
"SetsUnion" is right out; It's easily confused for a typo and speaking it (even in your head) will confuse it for "Set's Union" (Or worse).
The plural is also implied, so the 2nd 's' is redundant.
SetUnion is better but still ambiguous.
UnionOfSets is clearer and should be the bare minimum standard.
But all of these, so far, are uselessly vague (unless you are working with pure mathematical theory).
The term really should be specific. For example, "Red cars", "Programmers who spent too much time on esoterica", etc.
These are all unions of sets, but they tell you something useful. ;-)
.
Finally, Phil Factor had the right of it. To paraphrase:
Can you shout a (term) out across a crowded room and have it keyed in, and successfully (used), by a listener at the other side?
Try yelling, "SetsUnion," or even, "UnionOfSets," across a packed Irish bar. ;-)

1) i would use SetUnion and SegmentIntersection because i think in this case the plurality is implied anyway and it just looks nicer that way.
2) again, i would use FilterOutputs and VectorDirections, for the same reason. you could always use MultipleFilterOutputs if you want to be more specific.
but ultimately it's entirely down to your personal preference.

I think that while general naming conventions and consistency are important, but in a very very tight/tricky algorithm, clarity should trump convention. If it helps, use veryLongAndDescriptiveIdentifiers.

What's wrong with Union()?
Moreover, "union of sets" turns into "sets' union" (the two sets' union is ...); I'm sure I'm not the only person who's okay with CamelCase but not CamelsCaseMinusApostrophes. If it needs an apostrophe to make sense, don't use it. Set.Union() reads exactly like "union of set(s)".
Mathematations will also say "the (set) union of A and B", or rarely "A and B's (set) union". "The sets' union of A and B" makes no sense!
Most people will also see Vector[] vectors and Directions[] vectorDirections and assume that vectors[i] corresponds to vectorDirections[i]. If things really get ambiguous, I use something like vector_by_index and vectorDirection_by_index. Then you can have Map<Filter,Output> output_by_filter or Map<Filter,Output[]> outputs_by_filter, which makes it very obvious what the key is (this is very important in Objective-C where it's completely non-obvious what type the keys or values are).
If you really want, you can add an s and get vectors_by_index, but then consistency gives you the silly outputss_by_filter.
The right thing is, of course, something like struct FilterState { Filter filter; Output[] outputs; }; FilterState[] filterStates;.

I'd suggest singular for the first word: SetUnion, VectorDirections, etc.
Do a quick class search in your IDE, for: Strings*, Sets*, Vectors*, Collections*
Anyway, whatever you choose, be consistent throughout the whole application.

Related

Can different program implementations have the same program semantics?

So for any given language, if we implement the same program(i.e same output for any given input) twice, using different syntax (i.e. using i++ instead of i+1) will the two programs have the same semantics? Why?
Does the same apply in case where we use different constructs (i.e. Arrays vs Arraylists)?
Thanks
Yes. Depending on the programming language, there can be (combinations of) different syntax constructs with identical semantics.
For example, we can define a programming language with 3 constructs: A and B, both of which are semantically equivalent, and composition (e.g XY for any X and Y where any of these can either be A, B or any composition thereof). Hence program A is equivalent to program B. Also AA is equal to AB, BA and BB etc.
Further, if we extend the language with C which is semantically equivalent to AA, then, for example, BC is equivalent to AAA etc.
So for any given language, if we implement the same program(i.e same output for any given input) twice, using different syntax (i.e. using i++ instead of i+1) will the two programs have the same semantics?
That question is a tautology. The answer is yes. Obviously.
If two different programs produce the same results for all possible input sets, then they do have the same semantics. By definition1.
Why?
Because that is what "same semantics" means!
Does the same apply in case where we use different constructs (i.e. Arrays vs Arraylists)?
Yes.
(One data structure might use more memory, and that might cause an OOME for one version and not the other ... for certain input datasets. But then I would argue that the programs DO NOT produce the same results for all possible inputs.)
Note that this applies to all practical programming languages. Any programming language where there are programs that can only be written one way ... is probably too restrictive to be usable.
1 - OK, so anyone who has studied programming semantics would probably have a fit when they read that. But I am trying to provide an intuitive explanation rather than one that has a decent mathematical foundation. Horses for courses ... as they say.

Additional PlanningEntity in CloudBalancing - bounded-space situation

I successfully amended the nice CloudBalancing example to include the fact that I may only have a limited number of computers open at any given time (thanx optaplanner team - easy to do). I believe this is referred to as a bounded-space problem. It works dandy.
The processes come in groupwise, say 20 processes in a given order per group. I would like to amend the example to have optaplanner also change the order of these groups (not the processes within one group). I have therefore added a class ProcessGroup in the domain with a member List<Process>, the instances of ProcessGroup being stored in a List<ProcessGroup>. The desired optimisation would shuffle the members of this List, causing the instances of ProcessGroup to be placed at different indices of the List List<ProcessGroup>. The index of ProcessGroup should be ProcessGroup.index.
The documentation states that "if in doubt, the planning entity is the many side of the many-to-one relationsship." This would mean that ProcessGroup is the planning entity, the member index being a planning variable, getting assigned to (hopefully) different integers. After every new assignment of indices, I would have to resort the list List<ProcessGroup in ascending order of ProcessGroup.index. This seems very odd and cumbersome. Any better ideas?
Thank you in advance!
Philip.
The current design has a few disadvantages:
It requires 2 (genuine) entity classes (each with 1 planning variable): probably increases search space (= longer to solve, more difficult to find a good or even feasible solution) + it increases configuration complexity. Don't use multiple genuine entity classes if you can avoid it reasonably.
That Integer variable of GroupProcess need to be all different and somehow sequential. That smelled like a chained planning variable (see docs about chained variables and Vehicle Routing example), in which case the entire problem could be represented as a simple VRP with just 1 variable, but does that really apply here?
Train of thought: there's something off in this model:
ProcessGroup has in Integer variable: What does that Integer represent? Shouldn't that Integer variable be on Process instead? Are you ordering Processes or ProcessGroups? If it should be on Process instead, then both Process's variables can be replaced by a chained variable (like VRP) which will be far more efficient.
ProcessGroup has a list of Processes, but that a problem property: which means it doesn't change during planning. I suspect that's correct for your use case, but do assert it.
If none of the reasoning above applies (which would surprise me) than the original model might be valid nonetheless :)

In UML/ER diagramming, how to notate & make transaction requirements with entity that has different values depending on a certain attribute?

I am diagramming an art museum system, where there are Permanent_Art_Objects. Each Permanent_Art_Object has many attributes, and can also be either a 1) Sculpture/Statue, 2) Painting, or 3) Other. Depending on whether it's a sculpture/statue, painting, or other, it has sub-attributes unique to itself.
Here is an example of these sub-attributes.
What is the proper notation for showing these 'sub-attributes'?
For example, if Permanent_Art_Object is Other, it has as sub-attributes Type and Style.
Also, how would I make a query to INSERT INTO Permanent_Art_Object VALUES() for a new art object, if there's so much variety??
It all depends on what you are making. If this is purely for a database, I think ERD's are the cleanest way for modeling but a sidenote is that there are atleast 4 types of notations. Below is how I would do it in UML and ERD with the limited context I have.
More info about ERD's:
Basics: http://web.cse.ohio-state.edu/~gurari/course/cse670/cse670Ch2.xht
Specialisations: http://web.cse.ohio-state.edu/~gurari/course/cse670/cse670Ch16.xht
Overview of different types: http://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model#Cardinalities
My example:

Any mental technique to quickly deduce the required ifs/elses in a program with A LOT of conditional logic?

Often in programming, it is a very common requirement that some piece of functionality will require a lot of conditional logic, but not quite enough to warrant a rules engine.
For example, testing a number is divisible by x but also a multiple of something, a factor of something else, a square root of something, etc. As you can imagine, something along these lines will easily involve a lot of ifs/elses.
While it is possible to reduce the clutter with more modern programming techniques, how do you quickly and, in a calculated fashon, deduce the required ifs/elses?
For example, in a program to deduce the necessary quote for a car insurance prospective customer (rules engines aside btw), there would be conditional logic for age, location, driving points, what age those points are collected at, etc. Is there any mental technique to quickly deduce the redundant conditional branches? Is it just plain experience and no special mental technique? This is important because pair programming there is a lot of noise and thus difficult to actually think something through or even get enough time to implement the idea.
Thanks
I would suggest that trying to do this sort of thing in your head is asking for trouble, and trying to do it with a partner is going to make it much worse. Sometimes you have to sit and think and even make some notes on paper. If you don't like propositional logic, try decision tables.
I would add simple, readable, short methods, such as:
IsMinor(..)
IsRecordClean(..)
And then use them in conjunction to create new methods with meaningful names, such as:
IsMeetingPreReqs(..) //which checks several "simple" conditions
IsValidForInsurance(..)
(Sorry for the examples, I'm struggling with my English here, but you get the point..)
IMO that will make your code much clearer, and thus reduce the chances of being confused by distractions.
Not mental per se, but kinda..
I would say you should use propositional logic.
Suggest that:
q = age is greater than 18
p = location is within 10 miles
r = driving points are less than 3
s = age is less than 18 when points are collected
you could say...
(^ is AND)
if (q ^ p ^ r ^ s) {
//you are eligible or something!
} else {
//get outta here
}

Should we put units of measurements in attribute names? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I think most of us agree that it's a good idea to use a descriptive name for variables, object attributes, and database columns. If you want to store something's name, you may as well call the attribute Name so people know what to put in it.
Where the unit of measurement isn't immediately apparent, I think you should go a step further and include the unit of measurement in the name. Length_mm, for example, should help remind developers that they'd better convert the length to mm if the user just entered it in inches.
My database administrator, however, just told me that including units of measurement in database column names is “frowned upon”. I think that's just nuts, but perhaps there's some risk DBAs know about that I don't.
Throw me a line, here: should we embed units of measurement in our attribute names? Why? Why not?
If you have a consistent UOM for things, then your DBA's policy is OK.
For example, if timespans are ALWAYS in minutes, etc.
If the UOM could change, then you should store it in another column, alongside the qty.
That said, I tend to side with you on this. Clarity trumps most things, including this. I'd rather see DurationMinutes than Duration and have to guess what the UOM is.
Yes. You should.
The key, as #[Charles Bretana] pointed out, is legibility and that the other users of your table or developers following you know what you're using.
I would absolutely involve the units/measurement in a field name - in my business you can't guess what you'll find from the context or name: a field entitled MarketValue - is that in millions, thousands or units? US Dollars, Euros, pounds, $CURRENCY? Is that value a percentage, a ratio? Absolute or relative? Daily, monthly, calendar year, financial year? That timestamp, what time zone is it?
Your first, last and only task when providing data is to ensure that it isn't used incorrectly because the consumer wasn't able to find out enough about it. As developers, throwing "Metre", "USD", "GMT", "Percent" or whatever into a field name isn't the least bit smelly.
There are enormous smells that need resolving before the tiny whiff of field naming needs standardising.
This is why the Mars Climate Orbiter crashed into the surface at 350 meters/sec when it was planned to only handle 350 ft/sec (or something like that).
Although "Never say 'Never' or 'Always'" is, in general, a good rule of thumb, here I will bend my rule and say I think you should "always" make it clear what units a numeric value is in.
The convention of naming all my columns in the format:
{name}_in_{unit}
helped for one project, since I was using si units it actually ended up allowing me to be able to infer the column data type and generally simplify my writing style.
length_in_m
speed_in_ms-1
color_in_nm
there were a few exceptions that I handled either with at_time or number_of:
started_at_time
updated_at_time
number_of_rotations
I think this is a good idea anywhere since there is always room for ambiguity.
For example, the with high performance timer class we use, I keep having to check if the GetElapsed() method returns seconds or milliseconds or something else. If it were called GetElapsedMilliseconds() that would save the confusion.
The only downside being if you wanted to change your mind ... but in that case any clients would need to know about the change anyway.
F# has an interesting twist on this allowing measurement units to be specified in the type system. See this blog post, and another stackoverflow question discussing Are units of measurement unique to F#?
I've done a lot of database work, and I would not frown upon that at all, nor have I heard of frowning on it.
It's better than the extended properties, which is not apparent to the casual developer. It's better than in a separate document, because many developers won't read them, and certainly not in great detail. If the units are set, then having it in the name sounds like a good idea. If that changes, then when the unit field is added, change the name of the measurement field.
Where the unit of measurement isn't immediately apparent, I think you should go a step further and include the unit of measurement in the name. Length_mm, for example, should help remind developers that they'd better convert the length to mm if the user just entered it in inches.
You could go even a step further (in your code, not in the database) and have a Length type, which takes care of the measurement unit and of possible conversions. This is the approach of the "Quantity" pattern in Martin Fowler's "Analysis Patterns" book.
Do not put units of measurement (or column type) in your database column names.
Many Databases have the ability to document/comment columns in some way (in SQL Server it is sp_addextendedproperty), I would suggest that is a more appropriate place.
For Python datetimes, consider using objects from the datetime package. Doing so will capture the unit implicity to microsecond resolution. There is then no basis for including the unit in the variable name.
If you must use an int or float instead, it is strongly recommend to suffix the unit name abbreviation to the variable name. For example, instead of the variable name diff, use diff_secs for seconds, diff_ms for milliseconds, diff_µs for microseconds, or diff_ns for nanoseconds.
We don't put units of measurement in column names in our database. We do, however, have a data dictionary document where all of the columns and relationships are described.
The ideal approach is, if possible, to use a type that leaves no ambiguity as to the measurement. For example in .NET rather than saying int periodInSeconds you'd be much better off using TimeSpan period.
The F# language actually has units of measurement as part of the type system so you can declare types in units such as 10<m/s> and 5<s> and even perform calculations on them so something like 10<m/s> * 5<s> would result in 50<m>. See here for more info.
So I'd say if possible use a type that conveys your intention, but if that isn't possible then you should probably encode the measurement into the name. It's better and more obvious than a comment.
You definitely want units of measurement somewhere. I don't know if the column names are a good place or if the schema is better. Ask your database administrator
Where is the information about units of measure stored?
How can I get access to the units programmatically?
If the answers are "it isn't" or "you can't", complain bitterly---they have no right to deny you your naming convention. Otherwise, all may be happier if you work within the system.
P.S. I really like the support for units of measure that they've put into F#.
I have to say, I hate "descriptive" variable names becoming "incredibly verbose" variable names.
My preferred alternative is to use nothing but the unit-of-measure names in short functions. Eg.
function velocity(m, s) {
return m/s;
}
You don't need to say "length_m" because in this context, it's obvious that only lengths are measurable in metres.
Having said that. If I was writing a system where units of measure errors were really dangerous, I'd probably use the type system and define a Length class which always converted itself into a standard unit for any calculation. Maybe even different sub-classes for Feet, Metres etc.
NO, the name of the attribute is seperate from its unit of measurement.
If you call a variable length_mm then you are tied to mm.
what if you use a 32bit int to store length_mm, eventually the length in mm may get larger then 62,000, or whatever the limit is on 32bit ints. You cant switch over to m cause you tied you length variable to length_mm.
I think putting units in your identifiers is a huge design smell. It almost surely means that you chose the wrong language: if units are so important to the project, you'd better be using a language whose type system is capable of representing them.