Is weak typing a performance increase or a decrease? - optimization

When writing interpreted languages, is it faster to have weak typing or strong typing?
I was wondering this, because often the faster dynamically typed interpreted languages out there (Lua, Javascript), and in fact most interpreted languages use weak typing.
But on the other hand strong typing gives guarantees weak typing does not, so, are optimization techniques possible with one that aren't possible with the other?
With strongly typed I mean no implicit conversions between types. For example this would be illegal in a strongly typed, but (possibly) legal in a weakly typed language: "5" * 2 == 10. Especially Javascript is notorious for these type conversions.

it seems to me that the question is going to be hard to answer with explicit examples because of the lack of "strongly typed interpreted languages" (using the definitions i understand from the question comments).
i cannot think of any language that is interpreted and does not have implicit conversions. and i think this for two reasons:
interpreted languages tend not be statically typed. i think this is because if you are going to implement a statically typed language then, historically, compilation is relatively easy and gives you a significant performance advantage.
if a language is not statically typed then it is forced into having implicit conversions. the alternative would make life too hard for the programmer (they would have to keep track of types, invisible in the source, to avoid runtime errors).
so, in practice, all interpreted languages are weakly typed. but the question of a performance increase or decrease implies a comparison with some that are not. at least, it does if we want to get into a discussion of different, existing implementation strategies.
now you might reply "well, imagine one". ok. so then you are asking for the performance difference between code that detects the need for a conversion at runtime with code where the programmer has explicitly added the conversion. in that case you are comparing the difference between dynamically detecting the need for a conversion and calling an explicit function specified by the programmer.
on the face of it, detection is always going to add some overhead (in a [late-]compiled language that can be ameliorated by a jit, but you are asking about interpreters). but if you want fail-fast behaviour (type errors) then even the explicit conversion has to check types. so in practice i imagine the difference is relatively small.
and this feeds back to the original point - since the performance cost of weak typing is low (given all the other constraints/assumptions in the question), and the usability costs of the alternative are high, most (all?) interpreted languages support implicit conversion.
[sorry if i am still not understanding. i am worried i am missing something because the question - and this answer - does not seem interesting...]
[edit: maybe a better way of asking the same(?) thing would be something like "what are the comparative advantages/disadvantages of the various ways that dynamic (late binding?) languages handle type conversion?" because i think there you could argue that python's approach is particularly powerful (expressive), while having similar costs to other interpreted languages (and the question avoids having to argue whether python or any other language is "weakly typed" or not).]

With strongly typed I mean no implicit conversions between types.
"5" * 2 == 10
The problem is that "weak typing" is not a well-defined term, since there are two very different ways such "implicit conversions" can happen, which have pretty much the opposite effect on performance:
The "scripting language way": values have a runtime type and the language implicitly applies semantic rules to convert between types (such as formatting a binary number as a decimal string) when an operation calls for the different type. This will tend to decrease performance since it A) requires there to be type information at runtime and b) requires that this information be checked. Both of these requirements introduce overhead.
The "C way": at runtime, it's all just bytes. If you can convince the compiler to apply an operation that takes a 4 byte integer on a string, then depending on how exactly you do it, either the first 4 bytes of that string will simply be treated as if they were a (probably very large) integer, or you get a buffer overrun. Or demons flying out of your nose. This method requires no overhead and leads to very fast performance (and very spectacular crashes).

Related

Functional data-structures, OO notions of dispatched equality and comparison, StructuralEquality, and referential transparency

I have a very CPU intensive F# program that depends on persistent data-structures - about 40% of the total CPU time is spent in the Map module. So I thought I'd try out the PersistentHashMap in FSharpX collections. (BTW, this is already a big improvement over the previous version of F# in VS2013 where the same program spent 70% of its time in Map. I also notice that running programs with the debugger attached doesn't have the huge penalty it did before - good work guys...) There is also a hot-spot where I'm re-sorting all the time, where instead I should be adding to a Heap, so I thought I'd give that a go as well.
Two issue became immediately apparent:
(1) Swapping out one for the other from an interface perspective proved harder than it seems it should - I.e., making a shim that let me switch from a Map to a PersistentMap, preserving both the needed module-based let-bound functions and Types necessary to use the each map. I know that having full HM type-inference (and no type-classes) is orthogonal to LSP-style referential transparency for the most part - but maybe I was missing some way to do this better with a minimal amount of code.
(2) The biggest problem (which I'd like to focus on here) is the reliance of the F# functional data-structs on oo-style dispatched equality and comparison via the IComparison (when 't : comparison), etc., family of interfaces.
Even for OO programs ISTM that the idea of dispatching equality and comparison is a bad idea -- an object "knows" how to perform its own domain-specific tasks, but it doesn't "know" for the most part what notion of equality is going to be necessary at various points in the program for various purposes -- so equality/comparison should not be part of the object's interface, but when these concepts are needed, they should always be mentioned explicitly. For example, there should never be a .Sort(), only a .SortWith(...). One could argue that even something as basic as structural equality in F# could be explicit a.StructEq(b) or a ~= b - otherwise you always get object.Equals -- but even stipulating that doing things this way is the best for a multi-paradigm language that's a first-class .Net citizen, it seems like there should at least be the option of using passed-in comparison and equality functions, but this is not the case.
This means that: (a) type constraints are enforced even if you don't want them, causing ripples of broken inferred typing (and hundreds of wavy red lines with it being unclear where the actual "problem" is) and (b), that by implementing a notion of equality or comparison that makes one container type happy in one part of your program (and in my case I want to use the same container and item type with two different notions of ordering in two different places), it is likely to silently break (or cause inefficiency, if one subsumes the other) in other parts of the code that depended on the default/previous implementation.
The only way around this that I could think of is wrapping each item a adapter object using new...with object expression - but I really don't want to create so much garbage just to get the code to work.
So, ISTM that we could have a "pure" version of each persistent data struct that could be loaded if desired (even basics like List, etc.) that do not depend on dispatched equality/comparison/hashing and do not impose type constraints - all such needs should be via a passed in fn's at the time of the call. (Dispatched eq/cmp would be only for used for interop with BCL collections that don't accept delegates.) Then we could have a [EqCmpHashThrowNotImplemented] attribute, and I could be sure that there were no default operations happening at all, and I would feel better about the efficiency and predictability of my code. (And this also let's one change from a Record to a Class or visa-versa w/o worrying about any changes in behavior due to default implementations.) Again, this would be optional, but done by with a simple import. (Which does mean that each base core collection type would have to be broken out into its own module, which isn't really a bad idea anyway.)
If I've overlooked a better way to do things or there are some patterns people are using here, I'd be interested.

What's the benefit of case-sensitivity in a program language? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Is there any advantage of being a case-sensitive programming language?
My first programming experiences where with the Basic family (MSX Basix, Q-basic, VB).
These are all not case-sensitive. Now, it might be because of these first experiences, but I've never grasped the benefit of a language being case sensitive. On the contrary, I think it is a source of unneeded overhead and bugs, and it still annoys me when I use e.g. Java or C.
Now, I just read on Clojure (a Lisp-dialect) and noticed - to my surprise - that one of the differences with Lisp is case-sensitivity.
So: what is actually the benefit (to the programmer) of having a case-sensitive language?
The only things I can think of are:
double the number of symbols
visual feedback and easier reading for complex variables using techniques like CamelCase, e.g. HopCount
However, the first argument doesn't hold because of being a major source for bugs (bad practice to use hopcount and HopCount in one method).
The second argument doesn't hold either, as a decent IDE can provide this also in an other way. A good example is the VBA IDE, which has a very good approach: the langauge is case-insensitive but as soon as you type a variable it will change it to the case used in its definition. For example, if you defined Dim thisIsMyVariable as string, it will change any occurrence of thisismyvariable into thisIsMyVariable). That provides the programmer with an immediate clue that the variable was actually typed-in correctly (because it changed appearance).
Edit: added ... benefit to the programmer ...
One point is, like you said, visual aid. Most programming languages (and even frameworks) have conventions on how to capitalize variables, names, etc.
Also, it enforces using uniform names everywhere, so you don't have a mess with the same variable referred to as "var", "Var" or even "VaR".
I can't remember of ever having bugs related to capitalization, so that point seems kind of contrived to me.
Using 2 variables of the same name but different capitalization to me sounds like a conscious attempt to shoot yourself in the foot. Different capitalization conventions almost everywhere signify objects of completely different type (classes, variables, methods and so on), so it's pretty hard to make such a mistake due to the completely different semantics.
I'd like to think of it in this way: what do we gain by NOT having case-sensitivity?
We introduce ambiguity, we encourage sloppiness and poor style.
This is a slightly subjective matter of course.
Many naming conventions demand that symbols denoting objects from different semantic classes (types, functions, variables) have their own name casing rules. In Java, for example, types names always begin with a upper case letter, while variables, member function names etc. begin with a lower case letter. This effectively puts type names in a different namespace and gives a visual clue what a statement actually means.
// declare and initialize a new Point
Point point=new Point();
// calls a static member function of type Point
Point.fooBar();
// calls a member function of Point
point.moveTo(x,y);

C++0x OOP paradigm shifts?

Are there any and if yes, which ones?
What do you mean by "paradigm shift"?
C++0x introduces many new features that will of course change the way you write programs.
There are little things that will probably have a big impact on the syntax used, but which won't change the semantics that much. Examples are lambda functions and range-based for-loop: they'll provide a better syntax for what we all are already doing.
Then there are big things that will change the way things work. In particular:
Rvalue reference could make you think in a different way about how objects work and how to use them: it'll probably be easier to pass (and return) objects by value.
Explicit conversion operators will let us define conversion operators, while doing this in C++03 was risky.
C++0x does not introduce any new paradigms and doesn't change any paradigms.
Edit: The implementation of those paradigms, however, is subject to some pretty big change with variadic templates and rvalue references, just to begin with.
As a matter of fact, I think that yes, there is a paradigm shift. Caveat: I have never written object-oriented code in C++.
The change that may allow a paradigm shift is the standardization of the smart-pointer std::shared_ptr. Now finally does the standard library contain a well implemented, efficient and probably bug-free shared pointer.
C++ experts know how hard it is to get them right, and that most library implementations of reference-counting pointers probably contain subtle bugs. It’s therefore important to finally have a reliable implementation even if (for some brain-dead reason) the company forbids the use of Boost.
This might have drastic consequences on the number of memory leaks: If object oriented C++ applications stopped leaking memory, that would be a paradigm shift.
On the other hand, companies that use their own smart pointers in OOP code will probably not switch to C++0x in the next ten years anyway.
(Just to emphasize this once more, since it’s been repeatedly misunderstood: I am not referring to the technology of smart pointers as a paradigm shift. I am referring to the complete disappearance of memory leaks in object-oriented architectures.)

Operator overloading - is it really reasonable to forbid?

Java forbids operator overloading, but coming from C++ I do not see any reason for that. In languages where operator symbols are symbols as any other, same rules apply to "+" as to"plus" and there is no problem. So what is the point?
Edit: To be more concrete, show me which disadvantage overloaded "+" may have over overloaded "equals".
Just as many other things in Java, this is a restriction because it may be confusing if used improperly. (Similarly as pointer arithmetic is forbidden because it is error prone.) I'm a big fan of Java, but I'm generally of the opinion that it shouldn't be forbidden just because it could be misused.
For instance, BigInteger would benefit greatly from overloading the + operator.
OK, I'll try my hand at this under the assumption that Gabriel Ščerbák is doing this for better reasons than railing against a language.
The issue for me is one of manageable complexity: How much of the code in front of me do I have to decode vs. simply read?
In most conventional languages, upon seeing the expression a + b I know what is going to happen. The variables a and b will be added together. I'm pretty confident that behind the scenes the code will be very concise, very fast native machine code that adds the two numbers, whether the numbers are short integers or double-precision or some mixture of the two. (In some languages I may have to also assume that these could be strings being concatenated, but that's a rant for an entirely different question -- but one that flavours this rant if you peer at it from the right angle.)
When I make my own user-defined type -- say the omnipresent Complex type (and why Complex isn't a standard data type in modern languages is way the Hell beyond me, but that, again, is a rant for a different question) -- if I overload an operator (or, rather, if the operator is overloaded for me -- I'm using a library, say), short of peering very closely at the code I will not know that I'm now calling (possibly-virtual) methods on objects instead of having very tight, concise code generated for me behind the scenes. I will not know of the hidden conversions, the hidden temporary variables, the ... well, everything that goes along with writing many operators. To find out what's really going on in my code I have to pay very close attention to every line and keep track of declarations that may be three screens away from my current location in the code. To say that this impedes my understanding of the code flowing before my eyes is an understatement. Important details are being lost because the syntactic sugar is making things taste too tasty.
When I'm forced to use explicit methods on the objects (or even static methods or global methods where that applies) this is a signal to me, while I'm reading, that tells me of the potential cost overheads and bottlenecks and the like. I know, without even having to think for an instant, that I'm dealing with a method, that I've got dispatching overhead, that I may have temporary object creation and deletion overhead, etc. Everything's in front of me right before my eyes -- or at least enough indicators are in front of me that I know to be more careful.
I'm not intrinsically opposed to operator overloading. There are times when it makes code clearer, yes indeed, especially when you have complicated calculations over many baffling expressions. I can understand, however, exactly why someone might not want to put that into their language.
There is a further reason not to like operator overloading from the language designer's viewpoint. Operator overloading makes for very, very, very difficult grammars. C++ is already infamous for being nigh-unparseable and some of its constructs, like operator overloading, are the cause of it. Again from the viewpoint of someone writing the language I can fully understand why operator overloading was left off as a bad idea (or a good idea that's bad in implementation).
(This is all, of course, in addition to the other reasons you've already rejected. I'll submit my own overloading of operator-,() in my old C++ days in that stew just to be really annoying.)
There is no problem with operator overloading itself, but how it's actually has been used. As long as you overload the operators to make sense, the language still makes sense, but if you give other meanings to operators, it makes the language inconsistent.
(One example is how the shift left (<<) and shift right (>>) operators has been overloaded in C++ to mean "input" and "output"...)
So, the reasoning when leaving out operator overloading was probably that the risk of misuse was greater than the benefits of having operator overloading.
I think that Java would benefit greatly from extending its operators to cover built-in Number object types. Early (pre-1.0) versions of Java were said to have it (in that there were no primitives - everything was an object) but the VM technology of the time made it prohibitive from a performance view.
But in terms of in general allowing user defined operator overloading, it is not in the spirit of the Java language. The main problem is simply that it is hard to implement an operator that is consistent with what you expect from mathematics across object types and it will open the door to a lot of bad implementations which lead to a lot of hard to find (therefore expensive) bugs. You can just look at how many bad equals implementations (as in violate the contract) there are in general Java code, and the problem would only get worse from there.
Of course there are languages that prioritize power and syntactical beauty over such concerns, and more power to them. It is just not Java.
Edit: How is a custom + operator different than a custom == implementation (captured in Java in the equals(Object) method)? It isn't, really. It is just that by allowing operator overloading, things that are intuitive to a sixth grader become untrue. The real world experience of equals(Object) implementations shows how such complex contracts become hard to enforce in the real world.
Further Edit: Let me clarify the above, as I shortened it while editing and lost the point. A + operator in math has certain properties, one of which is that it doesn't matter which order the numbers on either side appear - it has the same result. So consider even the simplest case of a + performing an add to a Collection:
Collection a = ...
Collection b = ...
a + b;
System.out.println(a);
System.out.println(b);
The intuitive understanding of + would lead to an expectation that a + b or b + a would give the same result, but of course they would not. Start mixing two object types that take each other as paramaters in their plus method (say Collection and String) and things get harder to follow.
Now certainly it is possible to design operators on objects which are well understood and lead to better, more readable and more understandable code than without them. But the point is that more often than not in home-grown corporate APIs what you would end up seeing is obfuscated code.
There are a few problems:
Overloading logical operators has side effects because of lazy evaluation.
Even in mathematical types there are ambiguities, is (3dpoint*3dpoint) a cross or scaler product
You can't define new operators, so people reuse existing operators in novel ways eg. "string1%string2" to mean split string1 on string2.
But you can't always protect idiots from themselves even with an outright ban.
The point is that whenever you see, for example, a plus sign being used in the code, you know exactly what it does given that you know the types of its operands (which you always do in Java, as it is strongly typed).

When should weak types be discouraged?

When should weak types be discouraged? Are weak types discouraged in big projects? If the left side is strongly typed like the following would that be an exception to the rule?
int i = 5
string sz = i
sz = sz + "1"
i = sz
Does any languages support similar syntax to the above? Tell me more about pros and cons to weak types and situations related.
I think you are confusing "weak typing" with "dynamic typing".
The term "weak typing" means "not strongly typed", which means that the value of a memory location is allowed to vary from what it's type indicates it should be.
C is an example of a weakly typed language. It allows code like this to be written:
typedef struct
{
int x;
int y;
} FooBar;
FooBar foo;
char * pStr = &foo;
pStr[0] = 'H';
pStr[1] = 'i';
pStr[2] = '\0';
That is, it allows a FooBar instance to be treated as if it was an array of characters.
In a strongly typed language, that would not be allowed. Either a compiler error would be generated, or a run time exception would be thrown, but never, at any time, would a FooBar memory address contain data that was not a valid FooBar.
C#, Java, Lisp, Java Script, and Ruby are examples of languages where this type of thing would not be allowed. They are strongly typed.
Some of those languages are "statically typed", which means that variable types are assigned at compile time, and some are "dynamically typed", which means that variable types are not known until runtime. "Static vs Dynamic" and "Weak vs Strong" are orthogonal issues. For example, Lisp is a "strong dynamically typed" language, whereas "C" is a "weak statically typed language".
Also, as others have pointed out, there is a distinction between "inferred types" and types specified by the programmer. The "var" keyword in C# is an example of type inference. However, it's still a statically typed construct because the compiler infers the type of a variable at compile time, rather than at runtime.
So, what your question really is asking is:
What are the relative merits and
drawbacks of static typing, dynamic
typing, weak typing, stong typing,
inferred static types, and user
specified static types.
I provide answers to all of these below:
Static typing
Static typing has 3 primary benefits:
Better tooling support
A Reduced likely hood of certain types of bugs
Performance
The user experience and accuracy of things like intellisence, and refactoring is improved greatly in a statically typed language because of the extra information that the static types provide. If you type "a." in a code editor and "a" has a static type then the compiler knows everything that could legally come after the "." and can thus show you an accurate completion list. It's possible to support some scenarios in a dynamically typed language, but they are much more limited.
Also, in a program without compiler errors a refactoring tool can identify every place a particular method, variable, or type is used. It's not possible to do that in a dynamically typed language.
The second benefit is somewhat controversial. Proponents of statically typed languages like to make that claim. Opponents of statically typed languages, however, contend that the bugs they catch are trivial, and that they would get caught by testing anyways. But, you do get notification of things like misspelled variable or method names up front, which can be helpful.
Statically typed languages also enable better "data flow analysis", which when combined with things like Microsoft's SAL (or similar tools) can help find potential security problems.
Finally, with static typing, compilers can do a lot more optimization, and so can produce faster code.
Drawbacks:
The main drawback for static typing is that it restricts the things you can do. You can write programs in dynamically typed languages that you can't write in statically typed languages. Ruby on Rails is a good example of this.
Dynamic Typing
The big advantage of dynamic typing is that it's much more powerful than static typing. You can do a lot of really cool stuff with it.
Another one is that it requires less typing. You don't have to specify types all over the place.
Drawbacks:
Dynamic typing has 2 main draw backs:
You don't get as much "hand holding" from the compiler or IDE
It's not suitable for critical performance scenarios. For example, no one writes OS Kernels in Ruby.
Strong typing:
The biggest benefit of strong typing is security. Enforcing strong typing usually requires some type of runtime support. If a program can proove type safety then a lot of security issues, such as buffer overuns, just go away.
Weak typing:
The big drawback of strong typing, and the big benefit of weak typing, is performance.
When you can access memory any way you like, you can write faster code. For example a database can swap objects out to disk just by writing out their raw bytes, and not needing to resort to things like "ISerializable" interfaces. A video game can throw away all the data associated with one level by just running a single free on a large buffer, rather than running destructors for many small objects.
Being able to do those things requires weak typing.
Type inference
Type inference allows a lot of the benefits of static typing without requiring as much typing.
User specified types
Some people just don't like type inference because they like to be explicit. This is more of a style thing.
Weak typing is an attempt at language simplification. While this is a worthy goal, weak typing is a poor solution.
Weak typing such as is used in COM Variants was an early attempt to solve this problem, but it is fraught with peril and frankly causes more trouble than it's worth. Even Visual Basic programmers, who will put up with all sorts of rubbish, correctly pegged this as a bad idea and backronymed Microsoft's ETC (Extended Type Conversion) to Evil Type Cast.
Do not confuse inferred typing with weak typing. Inferred typing is strong typing inferred from context at compile time. A good example is the var keyword, used in C# to declare a variable suitable to receive the value of a LINQ expression.
By contrast, weak typing is inferred each and every time an expression is evaluated. This is illustrated in the question's sample code. Another example would be use of untyped pointers in C. Very handy yet begging for trouble.
Inferred typing addresses the same issue as weak typing, without introducing the problems associated with weak typing. It is therefore a preferred alternative whenever the host language makes it available.
They should almost always be discouraged. The only type of code that I can think of where it would be required is low-level code that requires some pointer voodoo.
And to answer your question, C supports code like that (except of course for not having a string type), and that sounds like something PHP or Perl would have (but I could be totally wrong on that).
"
When should weak types be discouraged? Are weak types discouraged in
big projects? If the left side is strongly typed like the following
would that be an exception to the rule?
int i = 5 string sz = i sz = sz + "1" i = sz
Does any languages support similar syntax to the above? Tell me more
about pros and cons to weak types and situations related.
"
Perhaps you could program your own library to do that.
In C++ you can use something called an "operator overload", which means that you can declare a variable of one type to be initialized as a variable of another type. That is what makes the statement:
[std::string str = "Hello World";][1]
specifically you would define a function (where the variable's type is T and B is the type you want to set it as)
work, even though any text between quotes is interpreted as an array of chars.
T& T::operator= ( const B s );
Please note that this is a class's member function
Also note that you will probably want to have some sort of function that reverses this manipulation if you want to use it liberally - something like
B& T::operator= ( const T s);
C++ is powerful enough to allow you to make an object generally weakly typed, but if you want to treat it purely weakly typed, you will want to make just a single variable type that can be used as any primitive, and use only functions that take a pointer to void.
Believe me, it is a lot easier to use strongly typed programming when it is available.
I personally prefer strongly typed, because I don't need to worry about the errors that come when I don't know what a variable is meant to do. For example, if I wanted to write a function to talk to a person - and that function used the person's height, weight, name, number of children, etc. - but you gave me a color, I would get an error because you can't really determine most of these things for a color using an algorithm that is very simple.
As far as the pros of weakly typed, you might want to get used to loosely typed programming if you are programming something to be run within a program(i.e. a web browser or a UNIX shell). JavaScript and Shell Script are weakly typed.
I would suggest that a programming language like assembly language is one of the only harware-level weakly typed languages, but the flavor of Assembly language I've seen attaches a type to each variable depending on the allocated size, i.e. word, dword, qword.
I hope I gave you a good explanation and did not put any words in your mouth.
Weak types are by their very nature less robust than strong types, because you don't tell the machine exactly what to do - instead the machine has to figure out what you meant. This often works quite adequately, but in general it is not clear what the result should be. What is, for example, a string multiplied by float?
Does any languages support similar syntax to the above?
Perl allows you to treat some numbers and strings interchangeably. For example, "5" + "1" will give you 6. The problem with this sort of thing in general is that it can be hard to avoid ambiguity: should "5" + 1 be "51" or "6"? Perl gets around this by having a separate operator for string concatenation, and reserving + for numeric addition.
Other languages would have to sort out whether you mean to do a concatenation or an addition, and (if relevant) what type or representation the result will be.
I did ASP/VBScript coding and work with legacy code without "option strict" which allows weak typing.
It was a hell in many times, especially in the hands of less experienced programmers. We got all stupid errors takes ages to diagnose.
One of the stupid examples was like this:
'Config
Dim pass
pass = "asdasd"
If NOT pass = Request("p") Then
Response.Write "login failed"
REsponse.End()
End If
So far so good but if the user changes pass to an integer password, guess what it won't work anymore because int pass != string pass (from querystring). I thought it supposed to work but it didn't I can't remember the exact piece of code.
I hate weak typing, instead of stupid debugging session I can spend extra seconds for typing exact type of a variable.
Simply put, in my experience especially in the big projects and especially with unexperienced developers it's just trouble.