What is the difference between coercion and overloading? - oop

I am confused about the terms. from what i understand, coercion is when the language converts the variable types and overloading when the language uses the same symbols for more than one purpose.

Let us, for a moment, compare a computer language to a bakery: When the bake function is overloaded, you can have many different ovens (implementations), one for bread, one for pizza, etc. With coercion, you need only one oven, but for some things you want to bake you need a little gadget ("coercion function") to make it fit in that one oven.
So with overloading there are multiple implementations (ovens) determined by the argument type (food), with coercion there are multiple coercion functions (gadgets) from each argument type (food) to one single type (the one that the oven needs)
Whether you have many gadgets, or many ovens, they should be uniquely determined by the type of food. You'll have a problem when you e.g have an oven for Italian food, and another for food that contains tomatoes: what to do with a pizza? This is what e.g. Haskell programmers know as "overlapping typeclasses". It can sometimes be solved, e.g. if one oven is strictly more specialized than another (one for Italian food, and another for Tuscan food)

From a very general perspective, the main difference is that whereas overloading is explicit (controlled by you, the programmer), type coercion is implicit (controlled by the compiler).

Related

How can one define a language which does not fit in the Chomsky Hierarchy?

I'm asking this question because I've stumbled across the accepted answer of Chomsky Language Types
This quote is referring to Type-0 Grammars:
This means that if you have a language that is more expressive than
this type (e.g. English), you cannot write an algorithm that can list
each an every (and only these) words of the language
As far as I know:
There is no mathematical description for what English is so it is meaningless to argue about where it lands in the hierarchy of formal languages.
If there was, then English would certainly be recognizable by some Type-0 Grammar by virtue of it being defined by a finite amount of reasoning - where it be axioms, a grammar, anything. (If not - how could've someone define it if not by a finite amount of steps?)
Hence:
We can't start talking about how 'expressive' a grammar needs to be to generate precisely an unknown mathematical object
Therefore my problem:
How can one define a language which does not fit in the Chomsky Hierarchy?
If (?) it takes a finite amount of steps for mathematicians to define
sets with cardinalities that do not make them recursively enumerable - then grammars must exist which are more expressive than Type-0 since they (mathematicians) have followed a finite amount of rules (production rules if you will) to produce a non-RE set. Where are they?
A language is a possibly-infinite set of finite words written with some finite alphabet. Since the alphabet is finite and the length of each word is finite, the words of any language are enumerable, in the sense that there exists an enumeration. In other words, the size of any language is at most countably infinite.
However, since any subset of the Kleene closure of the alphabet is a language, the number of languages is not countably infinite. Hence, there is no enumeration of languages.
The Chomsky hierarchy is based on a formalism which can be expressed as a finite sentence with a finite alphabet (the same alphabet as the language being described, plus a couple of extra symbols). [Note 1] So the number of possible Type 0 grammars is countably infinite, and there cannot be a correspondence between the set of grammars and the set of languages.
However. The existence of languages (i.e. sets) for which no generative grammar exists does not necessarily mean that there is some other way of describing these languages which is "more expressive" than generative grammars. Any description which can be written as a finite string using a finite alphabet can only describe a countable infinity of sets. Whether or not it is the same countable infinity will depend on the formalisms, and in general there will be no algorithm which can demonstrate homomorphism. But some equivalences are known (such as the equivalence with Turing machines, which is a particularly interesting equivalence).
So, we have an interesting little conundrum, which is (of course) related to Gödel's Incompleteness Theorems. That is, there are more languages than ways of describing a language, no matter what system we use to describe a language. So the question "How do we describe a language for which no description is available?" does not have a good answer (and if we answer it, by calling some set "Sue", then there will still be an uncountable infinitude of possible sets for which no name exists).
While all this foraging into infinitudes is interesting, it has a few issues:
It has very little (if anything) to do with programming, so it's questionable whether it's on topic for StackOverflow.
Kurt Gödel and Georg Cantor, the two mathematicians responsible for most of the concepts in this answer, both suffered from severe depression. Just saying.
Notes
Although at first glance it might appear that the alphabet for a Type 0 grammar might be arbitrarily larger than the alphabet of the language being described, that is not actually the case. The grammar's alphabet consists of the target alphabet plus a finite set of non-terminals plus an → symbol; the non-terminals can be written using numbers in any convenient base, say binary. So only three additional symbols are required (and you could reduce that to two by arbitrarily designating one of the non-terminal numbers to be the arrow). (It might seem like you need a third symbol to delimit the names of non-terminals, but you can use a fibonacci encoding to produce codes which always start with a 1 and never include two 1s, so that you can use an extra 1 at the beginning to unambiguously mark the start of the symbol.)

Why is Kotlin's Number-class missing operators?

In Kotlin, the Number type sounds quite useful: A type to use whenever I need something numeric.
When actually using it, however, I quickly noticed it is pretty useless: I cannot use any operators on these numbers. As soon as I need to do something with them, I need to explicitly convert them (even for comparing).
Why did the language designers choose to not include operators in the Number specification?
Thinking on this, I noticed it could be tricky to implement Number.plus(n: Number): Number, because n might be of a different type than this.
On the other hand, such implementations do exist in all Number subtypes I checked. And of course they are necessary if I want to type 1 + 1.2, which calls Int.plus(d: Double): Double
The result for me is that I have to call .toDouble() every time I use a number. This makes the code hard to read (compare a.toDouble() < b.toDouble() with a < b).
Is there any technical reason why operators where omitted from Number?
The problem is the implementation of the compareTo method. While it sounds reasonable and easy to add it in the first place, the devil lies in the details:
How would you compare instances of arbitrary Number classes to each other? Kotlin could implement the compare method using toDouble(); however this has problems with equality/precision: How do you compare a BigDecimal to a Double? Using toDouble() on the BigDecimal might lose precision, and two (actually different) BigDecimals might be considered equal using this method.
The mess gets even worse when you start to assume one or both types were supplied by libraries, where you cannot make assumptions on precision etc.
In Java, the Number type is not Comparable either.
Furthermore, some Number values like NaN might not be comparable at all.
If you need a Number to be comparable, you can easily implement your own compareTo-method as extension function. This has some additional limitations though, as most Number subtypes implement Comparable, and the extension function will lose against that implementation.
Credit for this answer goes to Roland, I only extended his comments (see on the question) into an answer.

What is the purpose of the type system in light of CLOS (Common Lisp)?

It is my understanding that the memory layout of a Common Lisp object (bitwise tagging is defined by CLOS (classes).
I understand that every class has a corresponding type, but not every type has a corresponding class, because types can be compound (lists). I think that types are like logical constraints, as opposed to classes that are concrete "types" with a tagging scheme.
If this is correct, does the type system serve any other purpose other than being a logical constraint (such as specifying that an integer must be within a certain range, or that an array contains a particular type)?
If this is not correct, what purpose does the type system actually serve in light of CLOS? Thanks.
An object has only one class at a time, whereas it can satisfy multiple types.
The type system is a lattice, where you can compute a least-upper-bound and greatest lower bound of two types (using resp. or, and), and which admits a top type (T) and a bottom type (the NIL type, which is not the same as the NULL type).
An implementation of Common Lisp must be able to determine if a value belongs to a type, and that starts with atomic type specifiers, like character or integer, and grows with compound type specifiers (which can be defined by the user).
But whether this is done using tags or by static analysis is left to the implementation; in practice, CL is such that there are cases where you cannot statically determine the type of an object precisely (other than T), simply because an object can be redefined at a later point: you cannot assume its type is fixed (say: a function; that's why inlining or global declarations may help with type inference).
But if you have a scope in which a type can be guaranteed to be invariant, the the compiler is free to use unboxed data types to store values. Then you don't have tagged data. That is the case for local declaration of types for variables, but also for specialized arrays: once an array is built, its element type does not change over time and in some cases knowing that an array contains only (integer 0 15) elements can be used to pack data more efficiently.
CLOS was added to CL fairly late in the game (and it was not the only object system designed for CL)
Even with CLOS, the type system can be used by the compiler for optimizations and by user to reason about their code.
I think it's important to get away from the implementation of things, and instead concentrate on how the language thinks about them. Clearly the implementation needs to have enough information to know what sort of thing a given object is, and it's going to do that with some kind of 'tag' (which may or may not be some extra bits attached to the object -- some of it might be the leading bits of the address for instance). Below I've called this the 'representational type'. But you really have almost no access to that implementation detail from the language. It's tempting to think that, type-of tells you something which maps 1-1 onto the representational type, but that's not true: (type-of (cons 1 2) is permitted to return (cons integer integer) for instance, and I think it is probably allowed to return (cons integer number) or (cons (integer 1 1) (integer 2 2)). It's unlikely that there are distinct representational types for all of these: indeed there can't be since (type-of 1) can return (integer m n) for an infinite number of values of m & n.
So here's a take on how the language thinks about things, and the differences between classes and types, in CL.
Both the type system and the class system consist of a bounded lattice of types / classes. Being a lattice means that for any pair of objects there is a unique supremum (so, for types, a unique type of which both types are subtypes, and which has no subtypes for which that is true) and infimum (the reverse). Being bounded means there is a top & a bottom type / class.
Classes
Classes are first-class objects (you can store a class in a variable for instance).
All objects (including classes) belong to a class, and there is a well-defined operator to find the immediate class to which any object belongs.
There are a finite number of classes.
The class of an object corresponds fairly closely to its representational type, but not completely (there may be specialized array types which do not have corresponding classes for instance).
Classes can serve as types: (type-of 1 (class-of 1)) works, as does (subtypep (class-of 1) '(integer 0 1)) (the answers being t and nil, t respectively).
Types
Types are ways to denote collections of objects with common properties, but they are not themselves objects: they are, if anything, just names for collections of things -- the language specification calls these 'type specifiers'. In particular there are an infinite number of types: think of the type (integer m n) for instance. A small number of this infinitude of types correspond to representational types -- the actual information that tells the system what sort of thing something is -- but obviously most of them do not. There may be representational types which do not have corresponding types.
Types in practice serve three purposes I think.
Type information can tell the system about what representational types to use which can help it check that things are the right representational type and optimise things.
Type information can let the system make inferences which can help things significantly.
Type information can let programmers talk about what sort of things they are dealing with, even when that information is not helpful to the system. The system can treat such declarations as assertions about types which can make programs safer & easier to debug. This is an important reason for types: even if the system does not check them, it is useful for the person reading your code to know that it expects, say, an integer in [0, 30], ie an (integer 0 30). Indeed, even if the system does not automatically check declarations you can force checks with, say (check-type x '(integer 0 30) ...).
The second case is interesting. Let's say I have something which I have told the system is of type (double-float 0.0d0). This is very unlikely to be more useful in terms of representational type than double-float would be. But if I take the square root of this thing then knowing this type might be very useful indeed: the system can know that the result is a double-float, rather than a (complex double-float), and those types are extremely unlikely to be representationally the same. So the system can use my type declaration to make inferences in this way (and these inferences can cascade through the program). Note that classes can't do this (at least CL's classes can't), and neither can the representational type of an object: you need more information than that.
So yes, types serve a number of very useful purposes which aren't satisfied by classes.
A type is a set of values.
A type specifier is some way to succinctly represent a type.
Implementations may do all kinds of markings and registering in order to help them sort out the types of things, but that is not inherent to the concept of types.
A class is an object describing a set of other objects. Since having a succinct name for such a set (type) is quite useful, Common Lisp registers the class name as a type specifier for the corresponding set of objects. That is the whole relation of types to classes.
The type system defines different objects that do different things. The CLOS system is more so used for methods that define special behaviors for types in a more logical way for some programmers. Coming from Java, the CLOS System was more logical and systematic for me, so it has a role for some programmers. I like to think of the CLOS system as a class in Java such as the Integer class, and the type system similar to primitives in Java. The CLOS system simply helps you extend your objects with methods in a more systematic way than creating a structure imho.

Can different program implementations have the same program semantics?

So for any given language, if we implement the same program(i.e same output for any given input) twice, using different syntax (i.e. using i++ instead of i+1) will the two programs have the same semantics? Why?
Does the same apply in case where we use different constructs (i.e. Arrays vs Arraylists)?
Thanks
Yes. Depending on the programming language, there can be (combinations of) different syntax constructs with identical semantics.
For example, we can define a programming language with 3 constructs: A and B, both of which are semantically equivalent, and composition (e.g XY for any X and Y where any of these can either be A, B or any composition thereof). Hence program A is equivalent to program B. Also AA is equal to AB, BA and BB etc.
Further, if we extend the language with C which is semantically equivalent to AA, then, for example, BC is equivalent to AAA etc.
So for any given language, if we implement the same program(i.e same output for any given input) twice, using different syntax (i.e. using i++ instead of i+1) will the two programs have the same semantics?
That question is a tautology. The answer is yes. Obviously.
If two different programs produce the same results for all possible input sets, then they do have the same semantics. By definition1.
Why?
Because that is what "same semantics" means!
Does the same apply in case where we use different constructs (i.e. Arrays vs Arraylists)?
Yes.
(One data structure might use more memory, and that might cause an OOME for one version and not the other ... for certain input datasets. But then I would argue that the programs DO NOT produce the same results for all possible inputs.)
Note that this applies to all practical programming languages. Any programming language where there are programs that can only be written one way ... is probably too restrictive to be usable.
1 - OK, so anyone who has studied programming semantics would probably have a fit when they read that. But I am trying to provide an intuitive explanation rather than one that has a decent mathematical foundation. Horses for courses ... as they say.

Grammatically correct double-noun identifiers, plural versions

Consider compounds of two nouns, which in natural English would most often appear in the form "noun of noun", e.g. "direction of light", "output of a filter". When programming, we usually write "LightDirection" and "FilterOutput".
Now, I have a problem with plural nouns. There are two cases:
1) singular of plural
e.g. "union of (two) sets", "intersection of (two) segments"
Which is correct, SetUnion and SegmentIntersection or SetsUnion and SegmentsIntersection?
2) plural of plural
There are two subcases:
(a) Many elements, each having many related elements, e.g. "outputs of filters"
(b) Many elements, each having single related element, e.g. "directions of vectors"
Shall I use FilterOutputs and VectorDirections or FiltersOutputs and VectorsDirections?
I suspect correct is the first version (FilterOutupts, VectorDirections), but I think it may lead to ambiguities, e.g.
FilterOutputs - many outputs of a single filter or many outputs of many filters?
LineSegmentProjections - projections of many segments or many projections of a single segment?
What are the general rules, I should follow?
There's a grammatical misunderstanding lying behind this question. When we turn a phrase of form:
1. X of Y
into
2. Y X
the Y changes grammatical role from a noun in the possessive (1) to an adjective in the attributive (2). So while one may pluralise both X and Y in (1), one may only pluralise X in (2), because Y in (2) is an adjective, and adjectives do not have grammatical number.
Hence, e.g., SetsUnion is not in accordance with English. You're free to use it if it suits you, but you are courting unreadability, and I advise against it.
Postscript
In particular, consider two other possessive constructions, first the old-fashioned construction using the possessive pronoun "its", singular:
3a. Y, its X
the equivalent plural:
4a. Ys, their X
and their contractions, with 4b much less common than 3b:
3b. Y's X
4b. Ys' X
Here, SetsUnion suggests it is a rendering of the singular possessive type (3) Set's Union (=Set, its Union), where you intended to communicate the plural possessive (4) Sets, their Union (contracted to the less common Sets' Union).
So it's actively misleading.
Unless you're getting hamstrung by a convention driven system (ruby on rails, cakePHP etc), why not use OutputsOfFilters, UnionOfSets etc? They may not be conventional but they may be clearer.
For example its pretty clear that ProjectionOfLineSegments and ProjectionsOfLineSegment are different things or even ProjectionsOfLineSegments....
Using plural forms of nouns can make them more difficult to read.
When you have a number of things, they are usually stored in a datastructure - an array, a list, a map, set, etc.. generically called a collection or abstract data type. The interface to a collection of items is typically part of the programming environment (e.g. Collections in java and .net, STL in C++) and is well understood by developers to involve quantities of items.
You can avoid pluralizing your nouns, and make the fact that you are dealing with multiple quantities explicit, and indicate how they are accessed by incorporating the name of the collection. For example,
VectorDirectionList - the vectors and their directions are listed, e.g. some kind of Pair type. Works particularly well if you have a VectorDirection, combining a Vector and a Direction.
VectorDirectionMap - if the vector directions are mapped from vector.
Because it's a collection type, dealing with multiple objects is understood as it is endemic to a collection type. It then puts it in the same class as SetUnion - a union always involves at least 2 sets, and a VectorDirectionList makes it clear there can be more than one VectorDirection.
I agree about avoiding homonyms where the word has more than one word class, e.g. Filter, (and actually, Set, although to my mind Set would not really be used in a class name as a verb, so I interpret it as a noun.) I originally wrote this using FilterOutput as an example, but it didn't read well. Using a compound for Filter may help disambiguate - e.g. ImageFilterOutputs (or applying my own adivce, this would be ImageFilterOutputList.)
Avoiding plural forms with class names seems natural when you consider that an instance of a class is itself always one item - "an instance". If we use a plural name, then we get a mismatch - an instance trying to imply that it is multiple things - it itself is just one thing, even if it references multiple other things. The collection naming above builds on this - you have an instance which is a list, a map etc so there is no mismatch.
I'm assuming you are talking about programming language constructs, although the same thinking applies to tables/views. These are understood to involve quantities of items and table names are consequently often singlular (Customer, Order, Item) even though they store multiple rows. Many-to-Many Mapping tables are usually compounds of the entities being related, e.g. relating orders to items - OrderItem. In my experience, using plurals for table names makes the SQL difficult to read.
To sum up, I would avoid plural froms as they make reading harder. There are sure to be cases where they are unavoidable - where using the plural form is more readable than creating a huge name of nested entities and collections, but these are the exception than the rule.
What are the general rules, I should follow?
Make it Clear -- for both visual and aural thinkers.
Make it Specific but Accurate.
Make it pass the "crowded room" or "emergency phone call" test.
To illustrate with the SetsUnion example:
"SetsUnion" is right out; It's easily confused for a typo and speaking it (even in your head) will confuse it for "Set's Union" (Or worse).
The plural is also implied, so the 2nd 's' is redundant.
SetUnion is better but still ambiguous.
UnionOfSets is clearer and should be the bare minimum standard.
But all of these, so far, are uselessly vague (unless you are working with pure mathematical theory).
The term really should be specific. For example, "Red cars", "Programmers who spent too much time on esoterica", etc.
These are all unions of sets, but they tell you something useful. ;-)
.
Finally, Phil Factor had the right of it. To paraphrase:
Can you shout a (term) out across a crowded room and have it keyed in, and successfully (used), by a listener at the other side?
Try yelling, "SetsUnion," or even, "UnionOfSets," across a packed Irish bar. ;-)
1) i would use SetUnion and SegmentIntersection because i think in this case the plurality is implied anyway and it just looks nicer that way.
2) again, i would use FilterOutputs and VectorDirections, for the same reason. you could always use MultipleFilterOutputs if you want to be more specific.
but ultimately it's entirely down to your personal preference.
I think that while general naming conventions and consistency are important, but in a very very tight/tricky algorithm, clarity should trump convention. If it helps, use veryLongAndDescriptiveIdentifiers.
What's wrong with Union()?
Moreover, "union of sets" turns into "sets' union" (the two sets' union is ...); I'm sure I'm not the only person who's okay with CamelCase but not CamelsCaseMinusApostrophes. If it needs an apostrophe to make sense, don't use it. Set.Union() reads exactly like "union of set(s)".
Mathematations will also say "the (set) union of A and B", or rarely "A and B's (set) union". "The sets' union of A and B" makes no sense!
Most people will also see Vector[] vectors and Directions[] vectorDirections and assume that vectors[i] corresponds to vectorDirections[i]. If things really get ambiguous, I use something like vector_by_index and vectorDirection_by_index. Then you can have Map<Filter,Output> output_by_filter or Map<Filter,Output[]> outputs_by_filter, which makes it very obvious what the key is (this is very important in Objective-C where it's completely non-obvious what type the keys or values are).
If you really want, you can add an s and get vectors_by_index, but then consistency gives you the silly outputss_by_filter.
The right thing is, of course, something like struct FilterState { Filter filter; Output[] outputs; }; FilterState[] filterStates;.
I'd suggest singular for the first word: SetUnion, VectorDirections, etc.
Do a quick class search in your IDE, for: Strings*, Sets*, Vectors*, Collections*
Anyway, whatever you choose, be consistent throughout the whole application.