Why is Syntactic Sugar sometimes considered a bad thing? [closed] - language-design

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Syntactic sugar, IMHO, generally makes programs much more readable and easier to understand than coding from a very minimalistic set of primitives. I don't really see a downside to good, well thought out syntactic sugar. Why do some people basically think that syntactic sugar is at best superfluous and at worst something to be avoided?
Edit: I didn't want to name names, but since people asked, it seems like most C++ and Java programmers, for example, frankly don't care about their language's utter lack of syntactic sugar. In a lot of cases, it's not necessarily that they just like other parts of the language enough to make the lack of sugar worth the tradeoff, it's that they really don't care. Also, Lisp programmers seem almost proud of their language's strange notation (I won't call it syntax because it technically isn't), though in this case, it's more understandable because it allows Lisp's metaprogramming facilities to be as powerful as they are.

Syntactic sugar can in some cases interact in unpleasant ways.
some specific examples:
The first is c# (or java) specific, Auto boxing and the lock/synchronized construct
private int i;
private object o = new object();
private void SomethingNeedingLocking(bool b)
{
object lk = b ? i : o;
lock (lk) { /* do something */ }
}
In this example the helpful lock construct which can use any object as a synchronization point, combined with autoboxing, leads to a possible bug. The lock is simply taken on a new boxed instance of the i each time. It is arguable that the lock construct is over helpful and that some other specific construct on which to lock would be better but certainly the combination is still flawed.
Multiple variable declaration and pointers:
long* first, second;
A classic bug (though easy to spot). The sugar of multiple variables won't fit with the pointer declaration.
Some constructs do not need other aspects of the sugar to cause issues, a classic example is the ++ operator. It neatly lets you avoid writing
i = i + 1;
A widely used construct (and one which itself has scope for bugs since you must remember to update both variables if you wish to change from using i). However since this is easy to embed within other expressions the issue of prefix and postfix rears its head.
When used within a for loop this doesn't matter, the evaluation happens outside of any other evaluations, but used elsewhere it can be a source of confusion (since you may be embedding a very important aspect of the calculation (whether the current or next value should be used) into a very small and easily missed form.
All the above (except perhaps the lock/box one which the compiler really should spot for you) are cases where the usage may well be fine, or experienced programmers may think "that's perfectly clear to me" but the scope for confusion exists, certainly for novice programmers or those moving to a different syntax.

Syntactic sugar causes cancer of the semicolon. Alan Perlis
It is difficult to reason about syntactic sugar if the reasoning takes place without reference to a context. There are lots of examples about why "syntactic sugar" is good or bad, and all of them are meaningless without context.
You mention that syntactic sugar is good when it makes programs readable and easier to understand... and I can counter that saying that sometimes, syntactic sugar can affect the formal structure of a language, especially when syntactic sugar is a late addendum during the design of a programming language.
Instead of thinking in terms of syntactic sugar, I like to think in terms of well-designed languages that foster readability and ease of understanding, and bad-designed languages.
Regards,

Too much unnecessary sugar just adds bloat to the languages. I would name names but then I would just get flamed. :) Also, sometimes language employ syntactic sugar instead of doing a real implementation. For instance, there is a language that shall remain nameless whose "generics implementation" is just a thin layer of syntactic sugar.

Nonsense. C and Lisp programmers use syntactic sugar all the time.
Examples:
a[i] instead of *(a+i)
'(1 2 3) instead of (quote 1 2 3)

Syntax, in general, makes a language hard to learn, let alone master. Therefore, the smaller the set of syntax, the easier it is to learn and to try to master. This is a major reason why many new languages borrow the syntax from popular, existing languages.
Also, while I can simply avoid learning certain features I'm not interested in for whatever reason, I'll eventually find myself reading someone else's code who does like that feature and then I'll need to go learn that feature just to understand their code.

Syntactic sugar can either make your program more understandable, or less so. If you add syntactic sugar for trivial things, you just add cognitive burden, because the language becomes more complicated. On the other hand, if you can add syntactic sugar which somehow accomplishes to pinpoint a specific concept and highlight it, then you can win.

Personally, I've always found the term "syntactic sugar" ambiguous. I mean if you want to get technical, just about anything other than basic arithmetic, an if statement, and a goto is syntactic sugar.
I think what most people mean when they dismiss "syntactic sugar" is that a language feature makes something complicated overly simple. The most notorious example of this is Perl. But since I'm not a Perl expert, I'll give you an example of what I'm talking about in python (taken from this question):
reduce(list.__add__, map(lambda x: list(x), [mi.image_set.all() for mi in list_of_menuitems]))
This is an obvious attempt at making something simpler gone horribly, horribly wrong.
That's not to say I'm on the side of removing such features though. I think that such features just need to be used carefully.

I have always understood "syntactic sugar" to refer to any syntax added to an existing language that do not extend the capabilities of the language. Otherwise, anything less direct than binary machine language could be called syntactic sugar.
Even though they do not extend the capabilities of a language, they can still be very useful.
For example, LINQ is syntactic sugar because it doesn't add any new capabilities to C#3 that were not already possible in C#2. But to do the same thing as a simple LINQ expression in C#2 would take vastly more code to accomplish and be much harder to read.
Conversly, generics are not syntactic sugar, because you can do things with them in C#2 that were impossible with C#1, such as creating a collection class that can contain any value type without boxing.

See the Law of Leaky Abstractions - too much sugar and you just use it without understanding or knowing what is going on, and this makes it increasingly hard to debug if something does go wrong. It's not so much that "syntactic sugar" is a bad thing, just that a lot of programmers rely on it without really being aware of what they are shielded from, and then if the syntactic sugar runs into problems they're screwed.

Possibly because it leads to confusion in programmers who don't know what is really happening behind the scenes, which could in turn lead to some inefficient or poorly written code.. Just a guess, I don't think it is a "bad thing" either.

It's more typing and more layers of abstraction. I'd much rather use a language that is designed to have higher levels of abstraction then a language with syntactic sugar tacked on to do a poor job of imitating features other languages have built in.

Related

Is weak typing a performance increase or a decrease?

When writing interpreted languages, is it faster to have weak typing or strong typing?
I was wondering this, because often the faster dynamically typed interpreted languages out there (Lua, Javascript), and in fact most interpreted languages use weak typing.
But on the other hand strong typing gives guarantees weak typing does not, so, are optimization techniques possible with one that aren't possible with the other?
With strongly typed I mean no implicit conversions between types. For example this would be illegal in a strongly typed, but (possibly) legal in a weakly typed language: "5" * 2 == 10. Especially Javascript is notorious for these type conversions.
it seems to me that the question is going to be hard to answer with explicit examples because of the lack of "strongly typed interpreted languages" (using the definitions i understand from the question comments).
i cannot think of any language that is interpreted and does not have implicit conversions. and i think this for two reasons:
interpreted languages tend not be statically typed. i think this is because if you are going to implement a statically typed language then, historically, compilation is relatively easy and gives you a significant performance advantage.
if a language is not statically typed then it is forced into having implicit conversions. the alternative would make life too hard for the programmer (they would have to keep track of types, invisible in the source, to avoid runtime errors).
so, in practice, all interpreted languages are weakly typed. but the question of a performance increase or decrease implies a comparison with some that are not. at least, it does if we want to get into a discussion of different, existing implementation strategies.
now you might reply "well, imagine one". ok. so then you are asking for the performance difference between code that detects the need for a conversion at runtime with code where the programmer has explicitly added the conversion. in that case you are comparing the difference between dynamically detecting the need for a conversion and calling an explicit function specified by the programmer.
on the face of it, detection is always going to add some overhead (in a [late-]compiled language that can be ameliorated by a jit, but you are asking about interpreters). but if you want fail-fast behaviour (type errors) then even the explicit conversion has to check types. so in practice i imagine the difference is relatively small.
and this feeds back to the original point - since the performance cost of weak typing is low (given all the other constraints/assumptions in the question), and the usability costs of the alternative are high, most (all?) interpreted languages support implicit conversion.
[sorry if i am still not understanding. i am worried i am missing something because the question - and this answer - does not seem interesting...]
[edit: maybe a better way of asking the same(?) thing would be something like "what are the comparative advantages/disadvantages of the various ways that dynamic (late binding?) languages handle type conversion?" because i think there you could argue that python's approach is particularly powerful (expressive), while having similar costs to other interpreted languages (and the question avoids having to argue whether python or any other language is "weakly typed" or not).]
With strongly typed I mean no implicit conversions between types.
"5" * 2 == 10
The problem is that "weak typing" is not a well-defined term, since there are two very different ways such "implicit conversions" can happen, which have pretty much the opposite effect on performance:
The "scripting language way": values have a runtime type and the language implicitly applies semantic rules to convert between types (such as formatting a binary number as a decimal string) when an operation calls for the different type. This will tend to decrease performance since it A) requires there to be type information at runtime and b) requires that this information be checked. Both of these requirements introduce overhead.
The "C way": at runtime, it's all just bytes. If you can convince the compiler to apply an operation that takes a 4 byte integer on a string, then depending on how exactly you do it, either the first 4 bytes of that string will simply be treated as if they were a (probably very large) integer, or you get a buffer overrun. Or demons flying out of your nose. This method requires no overhead and leads to very fast performance (and very spectacular crashes).

C++0x OOP paradigm shifts?

Are there any and if yes, which ones?
What do you mean by "paradigm shift"?
C++0x introduces many new features that will of course change the way you write programs.
There are little things that will probably have a big impact on the syntax used, but which won't change the semantics that much. Examples are lambda functions and range-based for-loop: they'll provide a better syntax for what we all are already doing.
Then there are big things that will change the way things work. In particular:
Rvalue reference could make you think in a different way about how objects work and how to use them: it'll probably be easier to pass (and return) objects by value.
Explicit conversion operators will let us define conversion operators, while doing this in C++03 was risky.
C++0x does not introduce any new paradigms and doesn't change any paradigms.
Edit: The implementation of those paradigms, however, is subject to some pretty big change with variadic templates and rvalue references, just to begin with.
As a matter of fact, I think that yes, there is a paradigm shift. Caveat: I have never written object-oriented code in C++.
The change that may allow a paradigm shift is the standardization of the smart-pointer std::shared_ptr. Now finally does the standard library contain a well implemented, efficient and probably bug-free shared pointer.
C++ experts know how hard it is to get them right, and that most library implementations of reference-counting pointers probably contain subtle bugs. It’s therefore important to finally have a reliable implementation even if (for some brain-dead reason) the company forbids the use of Boost.
This might have drastic consequences on the number of memory leaks: If object oriented C++ applications stopped leaking memory, that would be a paradigm shift.
On the other hand, companies that use their own smart pointers in OOP code will probably not switch to C++0x in the next ten years anyway.
(Just to emphasize this once more, since it’s been repeatedly misunderstood: I am not referring to the technology of smart pointers as a paradigm shift. I am referring to the complete disappearance of memory leaks in object-oriented architectures.)

Are Traits good or bad?

This is an open-ended question, but I would like to solicit some opinions from the SO community on Traits; do you think Traits in Squeak/Pharo are a good thing, or should you stay away from them and use composition and delegation instead? I ask because while I know how to use them (thanks to the Pharo book), I am not really sure how acceptable it is to use them or where it is OK to use them and where it isn't.
I do not like traits because they introduce strong dependencies into code. These dependencies can be obvious (a class that imports a trait, a trait that expects methods), but also very subtle (a trait that shadows super methods/instance variables). Furthermore there is no adequate tool support for traits.
In my experience delegation gives a much better and more reusable design in a dynamically typed object-oriented language like Smalltalk.
Things have their pros and cons. Lukas rightly mentions many of the cons:
Introduce strong dependencies into code.
no adequate tool support.
While the second may go away some day, the first will not.
The purpose of traits is to prevent code duplication that occurs, when two classes that don't share a superclass other than Object, share an instance method. Now, sometimes delegation can fix that, but oftentimes it cannot. So, the pro of traits is:
Reduced code duplication.
My verdict here is that the disadvantages overweigh. I think that, today and forever, code duplication is bound to occur. And when delegation won't do, I can even imagine that code duplication isn't all that harmful, as it often precedes the divergent evolution of the copied code snippets.
I think, the best thing to do, as of today, is to keep automated track of code duplication, and always monitor when one end changes while the other doesn't. I'm currently writing a tool that'll keep track of such links, even across repositories. I'll report on it in my blog when it's ready.

Operator overloading - is it really reasonable to forbid?

Java forbids operator overloading, but coming from C++ I do not see any reason for that. In languages where operator symbols are symbols as any other, same rules apply to "+" as to"plus" and there is no problem. So what is the point?
Edit: To be more concrete, show me which disadvantage overloaded "+" may have over overloaded "equals".
Just as many other things in Java, this is a restriction because it may be confusing if used improperly. (Similarly as pointer arithmetic is forbidden because it is error prone.) I'm a big fan of Java, but I'm generally of the opinion that it shouldn't be forbidden just because it could be misused.
For instance, BigInteger would benefit greatly from overloading the + operator.
OK, I'll try my hand at this under the assumption that Gabriel Ščerbák is doing this for better reasons than railing against a language.
The issue for me is one of manageable complexity: How much of the code in front of me do I have to decode vs. simply read?
In most conventional languages, upon seeing the expression a + b I know what is going to happen. The variables a and b will be added together. I'm pretty confident that behind the scenes the code will be very concise, very fast native machine code that adds the two numbers, whether the numbers are short integers or double-precision or some mixture of the two. (In some languages I may have to also assume that these could be strings being concatenated, but that's a rant for an entirely different question -- but one that flavours this rant if you peer at it from the right angle.)
When I make my own user-defined type -- say the omnipresent Complex type (and why Complex isn't a standard data type in modern languages is way the Hell beyond me, but that, again, is a rant for a different question) -- if I overload an operator (or, rather, if the operator is overloaded for me -- I'm using a library, say), short of peering very closely at the code I will not know that I'm now calling (possibly-virtual) methods on objects instead of having very tight, concise code generated for me behind the scenes. I will not know of the hidden conversions, the hidden temporary variables, the ... well, everything that goes along with writing many operators. To find out what's really going on in my code I have to pay very close attention to every line and keep track of declarations that may be three screens away from my current location in the code. To say that this impedes my understanding of the code flowing before my eyes is an understatement. Important details are being lost because the syntactic sugar is making things taste too tasty.
When I'm forced to use explicit methods on the objects (or even static methods or global methods where that applies) this is a signal to me, while I'm reading, that tells me of the potential cost overheads and bottlenecks and the like. I know, without even having to think for an instant, that I'm dealing with a method, that I've got dispatching overhead, that I may have temporary object creation and deletion overhead, etc. Everything's in front of me right before my eyes -- or at least enough indicators are in front of me that I know to be more careful.
I'm not intrinsically opposed to operator overloading. There are times when it makes code clearer, yes indeed, especially when you have complicated calculations over many baffling expressions. I can understand, however, exactly why someone might not want to put that into their language.
There is a further reason not to like operator overloading from the language designer's viewpoint. Operator overloading makes for very, very, very difficult grammars. C++ is already infamous for being nigh-unparseable and some of its constructs, like operator overloading, are the cause of it. Again from the viewpoint of someone writing the language I can fully understand why operator overloading was left off as a bad idea (or a good idea that's bad in implementation).
(This is all, of course, in addition to the other reasons you've already rejected. I'll submit my own overloading of operator-,() in my old C++ days in that stew just to be really annoying.)
There is no problem with operator overloading itself, but how it's actually has been used. As long as you overload the operators to make sense, the language still makes sense, but if you give other meanings to operators, it makes the language inconsistent.
(One example is how the shift left (<<) and shift right (>>) operators has been overloaded in C++ to mean "input" and "output"...)
So, the reasoning when leaving out operator overloading was probably that the risk of misuse was greater than the benefits of having operator overloading.
I think that Java would benefit greatly from extending its operators to cover built-in Number object types. Early (pre-1.0) versions of Java were said to have it (in that there were no primitives - everything was an object) but the VM technology of the time made it prohibitive from a performance view.
But in terms of in general allowing user defined operator overloading, it is not in the spirit of the Java language. The main problem is simply that it is hard to implement an operator that is consistent with what you expect from mathematics across object types and it will open the door to a lot of bad implementations which lead to a lot of hard to find (therefore expensive) bugs. You can just look at how many bad equals implementations (as in violate the contract) there are in general Java code, and the problem would only get worse from there.
Of course there are languages that prioritize power and syntactical beauty over such concerns, and more power to them. It is just not Java.
Edit: How is a custom + operator different than a custom == implementation (captured in Java in the equals(Object) method)? It isn't, really. It is just that by allowing operator overloading, things that are intuitive to a sixth grader become untrue. The real world experience of equals(Object) implementations shows how such complex contracts become hard to enforce in the real world.
Further Edit: Let me clarify the above, as I shortened it while editing and lost the point. A + operator in math has certain properties, one of which is that it doesn't matter which order the numbers on either side appear - it has the same result. So consider even the simplest case of a + performing an add to a Collection:
Collection a = ...
Collection b = ...
a + b;
System.out.println(a);
System.out.println(b);
The intuitive understanding of + would lead to an expectation that a + b or b + a would give the same result, but of course they would not. Start mixing two object types that take each other as paramaters in their plus method (say Collection and String) and things get harder to follow.
Now certainly it is possible to design operators on objects which are well understood and lead to better, more readable and more understandable code than without them. But the point is that more often than not in home-grown corporate APIs what you would end up seeing is obfuscated code.
There are a few problems:
Overloading logical operators has side effects because of lazy evaluation.
Even in mathematical types there are ambiguities, is (3dpoint*3dpoint) a cross or scaler product
You can't define new operators, so people reuse existing operators in novel ways eg. "string1%string2" to mean split string1 on string2.
But you can't always protect idiots from themselves even with an outright ban.
The point is that whenever you see, for example, a plus sign being used in the code, you know exactly what it does given that you know the types of its operands (which you always do in Java, as it is strongly typed).

Why stick to get-set and not car.speed() and car.speed(55) respectively?

Apart from unambiguous clarity, why should we stick to:
car.getSpeed() and car.setSpeed(55)
when this could be used as well :
car.speed() and car.speed(55)
I know that get() and set() are useful to keep any changes to the data member manageable by keeping everything in one place.
Also, obviously, I understand that car.speed() and car.speed(55) are the same function, which makes this wrong, but then in PHP and also in Zend Framework, the same action is used for GET, POST, postbacks.
In VB and C# there are "properties", and are used by many, much to the disgust of purists I've heard, and there are things in Ruby like 5.times and .each, .to_i etc.
And you have operator overloading, multiple inheritance, virtual functions in C++, certain combinations of which could drive anyone nuts.
I mean to say that there are so many paradigms and ways in which things are done that it seems odd that nobody has tried the particular combination that I mentioned.
As for me, my reason is that it is short and cleaner to read the code.
Am I very wrong, slightly wrong, is this just odd and so not used, or what else?
If I still decide to stay correct, I could use car.speed() and car.setSpeed(55).
Is that wrong in any way (just omitting the "get" )?
Thanks for any explanations.
If I called car.speed(), I might think I am telling the car to speed, in other words to increase speed and break the speed limit. It is not clearly a getter.
Some languages allow you to declare const objects, and then restrict you to only calling functions that do not modify the data of the object. So it is necessary to have seperate functions for modification and read operations. While you could use overloads on paramaters to have two functions, I think it would be confusing.
Also, when you say it is clearer to read, I can argue that I have to do a look ahead to understand how to read it:
car.speed()
I read "car speed..." and then I see there is no number so I revise and think "get car speed".
car.getSpeed()
I read "for this car, get speed"
car.setSpeed(55)
I read "for this car, set speed to 55"
It seems you have basically cited other features of the language as being confusing, and then used that as a defense for making getters/setters more confusing? It almost sounds like are admitting that what you have proposed is more confusing. These features are sometimes confusing because of how general purpose they are. Sometimes abstractions can be more confusing, but in the end they often serve the purpose of being more reusable. I think if you wanted to argue in favor of speed() and speed(55), you'd want to show how that can enable new possibilities for the programmer.
On the other hand, C# does have something like what you describe, since properties behave differently as a getter or setter depending on the context in what they are used:
Console.WriteLine(car.Speed); //getter
car.Speed = 55 //setter
But while it is a single property, there are two seperate sections of code for implementing the getting and setting, and it is clear that this is a getter/setter and not a function speed, because they omit the () for properties. So car.speed() is clearly a function, and car.speed is clearly a property getter.
IMHO the C# style of having properties as syntatic sugar for get and set methods is the most expressive.
I prefer active objects which encapsulate operations rather than getters and setters, so you get a semantically richer objects.
For example, although an ADT rather than a business object, even the vector in C++ has paired functions:
size_type capacity() const // how many elements space is reserved for in the vector
void reserve(size_type n) // ensure space is reserved for at least n elements
and
void push_back ( const T& ) // inserts an element at the end
size_type size () const // the number of elements in the vector
If you drive a car, you can set the accelerator, clutch, brakes and gear selection, but you don't set the speed. You can read the speed off the speedometer. It's relatively rare to want both a setter and a getter on an object with behaviour.
FYI, Objective-C uses car.speed() and car.setSpeed(55) (except in a different syntax, [car speed] and [car setSpeed:55].
It's all about convention.
There is no right answer, it's a matter of style, and ultimately it does not matter. Spend your brain cycles elsewhere.
FWIW I prefer the class.noun() for the getter, and class.verb() for the setter. Sometimes the verb is just setNoun(), but other times not. It depends on the noun. For example:
my_vector.size()
returns the size, and
my_vector.resize(some_size)
changes the size.
The groovy approach to properties is quite excellent IMHO, http://groovy.codehaus.org/Groovy+Beans
The final benchmarks of your code should be this:
Does it work correctly?
Is it easy to fix if it breaks?
Is it easy to add new features in the future?
Is it easy for someone else to come in and fix/enhance it?
If those 4 points are covered, I can't imagine why anybody would have a problem with it. Most of the "Best Practices" are generally geared towards achieving those 4 points.
Use whichever style works for you, just be consistent about it, and you should be fine.
This is just a matter of convention. In Smalltalk, it's done the way you suggest and I don't recall ever hearing anybody complain about it. Getting the car's speed is car speed, and setting the car's speed to 55 is car speed:55.
If I were to venture a guess, I would say the reason this style didn't catch on is because of the two lines down which object-oriented programming have come to us: C++ and Objective-C. In C++ (even more so early in its history), methods are very closely related to C functions, and C functions are conventionally named along the lines of setWhatever() and do not have overloading for different numbers of arguments, so that general style of naming was kept. Objective-C was largely preserved by NeXT (which later became Apple), and NeXT tended to favor verbosity in their APIs and especially to distinguish between different kinds of methods — if you're doing anything but just accessing a property, NeXT wanted a verb to make it clear. So that became the convention in Cocoa, which is the de facto standard library for Objective-C these days.
It's convention Java has a convention of getters and setters C# has properties, python has public fields and JavaScript frameworks tend to use field() to get and field(value) to set
Apart from unambiguous clarity, why should we stick to:
car.getSpeed() and car.setSpeed(55)
when this could be used as well : car.speed() and car.speed(55)
Because in all languages I've encountered, car.speed() and car.speed(55) are the same in terms of syntax. Just looking at them like that, both could return a value, which isn't true for the latter if it was meant to be a setter.
What if you intend to call the setter but forget to put in the argument? The code is valid, so the compiler doesn't complain, and it doesn't throw an immediate runtime error; it's a silent bug.
.() means it's a verb.
no () means it's a noun.
car.Speed = 50;
x = car.Speed
car.Speed.set(30)
car.setProperty("Speed",30)
but
car.Speed()
implies command to exceed speed limit.