(start, end) vs. (start, length) in API design - api

I've seen two alternative conventions used when specifying a range of indexes, e.g.
subString(int startIndex, int length);
vs.
subString(int startIndex, int endIndex);
They are obviously equivalent in terms of what you can do with them, the only difference being whether you specify the ending index or the length of the range.
I'm assuming that in all cases startIndex would be inclusive, and endIndex exclusive.
Are there any compelling reasons to prefer one over the other when defining an API?

I'd prefer the length one simply because it gives me one less question to ask/look up in the documentation.
For the endIndex based one - is that an inclusive or exclusive end point?
(For either variant, the same question could be asked about startIndex, but it would be a perverse API that makes it exclusive).

How to disambiguate positional arguments...
use longer names subStringFromUpto( startIndex , stopIndex )
use uniform convention across the whole library
Didn't we find better after all these years ?
Ah yes, in Smalltalk maybe, since the question is tagged language-agnostic...
aString copyFrom: startIndex to: stopIndex.
aString substringOfLength: length startingAt: startIndex.
Less ambiguity, but maybe we'll have to wait another 30 years before larger adoption of such style
(it probably looks too much simple to be serious)

This is a good question and I think the preference for which to use comes down to what are the most common use cases. Most use cases are equally simple using either API, but consider this one:
You want to get a substring that starts at 5 and ends at the end of the string. Using the index based version (assuming it's second index is exclusive), it's as simple as:
str.subString(5, str.length());
With the length based API:
str.subString(5, str.length() - 5);
That second approach is much less succinct and obvious. However, this can be solved by simply stating that if the length will cause an overflow of the remaining string, it will gracefully support that (e.g. str.subString(5, str.length()); would grab everything from index 5 to the end even though it may be asking for more characters than are left). Ruby does this with their String#splice method in addition to supporting advanced things like negative indices.
In my opinion, the index based approach is more concrete, especially when negative indices aren't allowed. This makes it very obvious what to expect from the API, which can be a good thing; making it harder to shoot yourself in the foot. However, a well documented API, like Ruby, makes it easy to empower the programmer and can make for some graceful substring-ing.
I also find that in general, when I'm performing substring operations, that I often know my beginning and end points. With the length based approach, that's going to require an additional calculation when calling the API (e.g. substring(startIndex, endIndex - startIndex)).

Someone should do a study of typical call sites to find out which approach yields more succinct code (and therefore probably correct code).
I like the argument that using 'length' you don't have to look at the documentation, but you may already be looking at the documentation to determine whether the 2nd integer is the 'end' or the 'length'. If you name it endExclusive, then it's just as self-documenting.

Related

Why kotlin.math functions does not have implementation of Long

I have been working with kotlin for little over 2 years now.
Looking over what I learned in these 2 years, I noticed that I have been using(num.toDouble()).toLong() for kotlin.math functions a bit too much. For example, Math.sqrt(num.toDouble()).toLong(). Two of my projects have extension function sumByLong() inside util created by team, because kotlin libs only have sumBy:Int and sumByDouble:Double and a lot of work in the project, uses Long.
In short, Mathematical operations using Long is more common than using Double or Float, yet Long has a very small footprint in kotlin standard library. And since kotlin.math is different than java.lang.Math, mixed usage is not a recommended practice.
Going over docs of kotlin.math, all functions except for abs, min, max, only have implementation for Float and Double only.
Can someone Explain like I am 5 the possible reasoning behind this. Something real, not silly stuff like devs were lazy, or more code means more work, which is all I could find in search engine results.
--
Update: Some Clarification
1. I can understand that in most cases, return types will contain floating point numbers. I am also talking about parameters lacking long counterpart. Maybe using Math.sqrt wasn't the best example, something like math.log, math.cos, etc would be better example, where floating return type us expected, but parameters doesn't even support Int
2. When I said "Long is more common than using Double", I was not talking about public at large, but was looking over my past two years working with kotlin. I am sorry if my phrasing wasn't clear.
Disclaimer: this answer may be a little opinionated, but I believe it is according to general consensus and best practices of using maths in computer science.
Mathematics for integers and for real numbers (floats) are really two much different math "sub-worlds". They're pretty separate, they have different uses and we usually don't mix them.
If we work on some physics, we do real-world simulations, we operate on units like temperature or speed, we use doubles. If we have identifiers (bank account number), we count something (number of bank accounts) or we operate on a discrete values with 100% precision (bank account value) we always use integers and never doubles.
Operations like sinus, square root or logarithm make perfect sense for physics, but not really for bank account values. They very often produce either very small or very large numbers that can't be safely represented as integers. They operate on approximations and don't really provide 100% precise results. They are continuous by nature while integers are discrete.
What is the point of using integers with sqrt() or log() if they almost always return a floating point result? What is the point of passing an integer to sin() if e.g. there are only 2 distinct angles smaller than square angle that can be represented as an integer: 0 and 1? Using integers with these functions is unnatural and impractical.
I can't think of a case where we have to often convert between longs and doubles. Usually, we operate either on longs or on doubles top to bottom and we don't convert between them too often. By converting we lose advantages of these specific "math sub-worlds", we sum their disadvantages. Maybe you should just keep using doubles in your application and don't convert to/from longs? Why do you use longs?
BTW, you mentioned that you can't/shouldn't use java.lang.Math in the Kotlin application. Well, if you look into java.lang.Math you will notice that... it supports only doubles :-)
In the case of ceil, it returns a Double because a Double has a bigger range of values than Long. Consider, for example:
ceil(Long.MAX_VALUE.toDouble() * 1000)
What would you expect it to return if it returned a Long? For further discussion, see Why does Math.ceil return a double?
In the case of log and trigonometric functions, the use cases requiring Long parameters are rare and the requirements varied. For example, should it round up, down, or to the nearest integral value? These are decisions that should be made for your particular project, and therefore can't be made in the stdlib.
In your project, you can simply define your required functions in a single, small source file, making your project's choice of rounding method, and then use it everywhere instead of converting at each call site, e.g.:
fun cos(n: Long): Long = cos(x.toDouble()).roundToLong()

Why is Kotlin's Number-class missing operators?

In Kotlin, the Number type sounds quite useful: A type to use whenever I need something numeric.
When actually using it, however, I quickly noticed it is pretty useless: I cannot use any operators on these numbers. As soon as I need to do something with them, I need to explicitly convert them (even for comparing).
Why did the language designers choose to not include operators in the Number specification?
Thinking on this, I noticed it could be tricky to implement Number.plus(n: Number): Number, because n might be of a different type than this.
On the other hand, such implementations do exist in all Number subtypes I checked. And of course they are necessary if I want to type 1 + 1.2, which calls Int.plus(d: Double): Double
The result for me is that I have to call .toDouble() every time I use a number. This makes the code hard to read (compare a.toDouble() < b.toDouble() with a < b).
Is there any technical reason why operators where omitted from Number?
The problem is the implementation of the compareTo method. While it sounds reasonable and easy to add it in the first place, the devil lies in the details:
How would you compare instances of arbitrary Number classes to each other? Kotlin could implement the compare method using toDouble(); however this has problems with equality/precision: How do you compare a BigDecimal to a Double? Using toDouble() on the BigDecimal might lose precision, and two (actually different) BigDecimals might be considered equal using this method.
The mess gets even worse when you start to assume one or both types were supplied by libraries, where you cannot make assumptions on precision etc.
In Java, the Number type is not Comparable either.
Furthermore, some Number values like NaN might not be comparable at all.
If you need a Number to be comparable, you can easily implement your own compareTo-method as extension function. This has some additional limitations though, as most Number subtypes implement Comparable, and the extension function will lose against that implementation.
Credit for this answer goes to Roland, I only extended his comments (see on the question) into an answer.

Optimization of Lisp function calls

If my code is calling a function, and one of the function's arguments will vary based on a certain condition, is it more efficient to have the conditional statement as an argument of the function, or to call the function multiple times in the conditional statement.
Example:
(if condition (+ 4 3) (+ 5 3))
(+ (if condition 4 5) 3)
Obiously this is just an example: in the real scenario the numbers would be replaced by long, complex expressions, full of variables. The if might instead be a long cond statement.
Which would be more efficient in terms of speed, space etc?
Don't
What you care about is not performance (in this case the difference will be trivial) but code readability.
Remember,
"... a computer language is not just a way of getting a computer to
perform operations, but rather ... it is a novel formal medium for
expressing ideas about methodology"
Abelson/Sussman "Structure and
Interpretation of Computer Programs".
You are writing code primarily for others (and you yourself next year!) to read. The fact that the computer can execute it is a welcome fringe benefit.
(I am exaggerating, of course, but much less than you think).
Okay...
Now that you skipped the harangue (if you claim you did not, close your eyes and tell me which specific language I mention above), let me try to answer your question.
If you profiled your program and found that this place is the bottleneck, you should first make sure that you are using the right algorithm.
E.g., using a linearithmic sort (merge/heap) instead of quadratic (bubble/insertion) sort will make much bigger difference than micro-optimizations like you are contemplating.
Then you should disassemble both versions of your code; the shorter version is (ceteris paribus) likely to be marginally faster.
Finally, you can waste a couple of hours of machine time repeatedly running both versions on the same output on an otherwise idle box to discover that there is no statistically significant difference between the two approaches.
I agree with everything in sds's answer (except using a trick question -_-), but I think it might be nice to add an example. The code you've given doesn't have enough context to be transparent. Why 5? Why 4? Why 3? When should each be used? Should there always be only two options? The code you've got now is sort of like:
(defun compute-cost (fixed-cost transaction-type)
(+ fixed-cost
(if (eq transaction-type 'discount) ; hardcoded magic numbers
3 ; and conditions are brittle
4)))
Remember, if you need these magic numbers (3 and 4) here, you might need them elsewhere. If you ever have to change them, you'll have to hope you don't miss any cases. It's not fun. Instead, you might do something like this:
(defun compute-cost (fixed-cost transaction-type)
(+ fixed-cost
(variable-cost transaction-type)))
(defun variable-cost (transaction-type)
(case transaction-type
((employee) 2) ; oh, an extra case we'd forgotten about!
((discount) 3)
(t 4)))
Now there's an extra function call, it's true, but computation of the magic addend is pulled out into its own component, and can be reused by anything that needs it, and can be updated without changing any other code.

Cplex/OPL local search

I have a model implemented in OPL. I want to use this model to implement a local search in java. I want to initialize solutions with some heuristics and give these initial solutions to cplex find a better solution based on the model, but also I want to limit the search to a specific neighborhood. Any idea about how to do it?
Also, how can I limit the range of all variables? And what's the best: implement these heuristics and local search in own opl or in java or even C++?
Thanks in advance!
Just to add some related observations:
Re Ram's point 3: We have had a lot of success with approach b. In particular it is simple to add constraints to fix the some of the variables to values from a known solution, and then re-solve for the rest of the variables in the problem. More generally, you can add constraints to limit the values to be similar to a previous solution, like:
var >= previousValue - 1
var <= previousValue + 2
This is no use for binary variables of course, but for general integer or continuous variables can work well. This approach can be generalised for collections of variables:
sum(i in indexSet) var[i] >= (sum(i in indexSet) value[i])) - 2
sum(i in indexSet) var[i] <= (sum(i in indexSet) value[i])) + 2
This can work well for sets of binary variables. For an array of 100 binary variables of which maybe 10 had the value 1, we would be looking for a solution where at least 8 have the value 1, but not more than 12. Another variant is to limit something like the Hamming distance (assume that the vars are all binary here):
dvar int changed[indexSet] in 0..1;
forall(i in indexSet)
if (previousValue[i] <= 0.5)
changed[i] == (var[i] >= 0.5) // was zero before
else
changed[i] == (var[i] <= 0.5) // was one before
sum(i in indexSet) changed[i] <= 2;
Here we would be saying that out of an array of e.g. 100 binary variables, only a maximum of two would be allowed to have a different value from the previous solution.
Of course you can combine these ideas. For example, add simple constraints to fix a large part of the problem to previous values, while leaving some other variables to be re-solved, and then add constraints on some of the remaining free variables to limit the new solution to be near to the previous one. You will notice of course that these schemes get more complex to implement and maintain as we try to be more clever.
To make the local search work well you will need to think carefully about how you construct your local neighbourhoods - too small and there will be too little opportunity to make the improvements you seek, while if they are too large they take too long to solve, so you don't get to make so many improvement steps.
A related point is that each neighbourhood needs to be reasonably internally connected. We have done some experiments where we fixed the values of maybe 99% of the variables in a model and solved for the remaining 1%. When the 1% was clustered together in the model (e.g. all the allocation variables for a subset of resources) we got good results, while in comparison we got nowhere by just choosing 1% of the variables at random from anywhere in the model.
An often overlooked idea is to invert these same limits on the model, as a way of forcing some changes into the solution to achieve a degree of diversification. So you could add a constraint to force a specific value to be different from a previous solution, or ensure that at least two out of an array of 100 binary variables have a different value from the previous solution. We have used this approach to get a sort-of tabu search with a hybrid matheuristic model.
Finally, we have mainly done this in C++ and C#, but it would work perfectly well from Java. Not tried it much from OPL, but it should be fine too. The key for us was being able to traverse the problem structure and use problem knowledge to choose the sets of variables we freeze or relax - we just found that easier and faster to code in a language like C#, but then the modelling stuff is more difficult to write and maintain. We are maybe a bit "old-school" and like to have detailed fine-grained control of what we are doing, and find we need to create many more arrays and index sets in OPL to achieve what we want, while we can achieve the same effect with more intelligent loops etc without creating so many data structures in a language like C#.
Those are several questions. So here are some pointers and suggestions:
In Cplex, you give your model an initial solution with the use of IloOplCplexVectors()
Here's a good example in IBM's documentation of how to alter CPLEX's solution.
Within OPL, you can do the same. You basically set a series of values for your variables, and hand those over to CPLEX. (See this example.)
Limiting the search to a specific neighborhood: There is no easy way to respond without knowing the details. But there are two ways that people do this:
a. change the objective to favor that 'neighborhood' and make other areas unattractive.
b. Add constraints that weed out other neighborhoods from the search space.
Regarding limiting the range of variables in OPL, you can do it directly:
dvar int supply in minQty..maxQty;
Or for a whole array of decision variables, you can do something along the lines of:
range CreditsAllowed = 3..12;
dvar int credits[student] in CreditsAllowed;
Hope this helps you move forward.

Exponents in Genetic Programming

I want to have real-valued exponents (not just integers) for the terminal variables.
For example, lets say I want to evolve a function y = x^3.5 + x^2.2 + 6. How should I proceed? I haven't seen any GP implementations which can do this.
I tried using the power function, but sometimes the initial solutions have so many exponents that the evaluated value exceeds 'double' bounds!
Any suggestion would be appreciated. Thanks in advance.
DEAP (in Python) implements it. In fact there is an example for that. By adding the math.pow from Python in the primitive set you can acheive what you want.
pset.addPrimitive(math.pow, 2)
But using the pow operator you risk getting something like x^(x^(x^(x))), which is probably not desired. You shall add a restriction (by a mean that I not sure) on where in your tree the pow is allowed (just before a leaf or something like that).
OpenBeagle (in C++) also allows it but you will need to develop your own primitive using the pow from <math.h>, you can use as an example the Sin or Cos primitive.
If only some of the initial population are suffering from the overflow problem then just penalise them with a poor fitness score and they will probably be removed from the population within a few generations.
But, if the problem is that virtually all individuals suffer from this problem, then you will have to add some constraints. The simplest thing to do would be to constrain the exponent child of the power function to be a real literal - which would mean powers would not be allowed to be nested. It depends on whether this is sufficient for your needs though. There are a few ways to add constraints like these (or more complex ones) - try looking in to Constrained Syntactic Structures and grammar guided GP.
A few other simple thoughts: can you use a data-type with a larger range? Also, you could reduce the maximum depth parameter, so that there will be less room for nested exponents. Of course that's only possible to an extent, and it depends on the complexity of the function.
Integers have a different binary representation than reals, so you have to use a slightly different bitstring representation and recombination/mutation operator.
For an excellent demonstration, see slide 24 of www.cs.vu.nl/~gusz/ecbook/slides/Genetic_Algorithms.ppt or check out the Eiben/Smith book "Introduction to Evolutionary Computing Genetic Algorithms." This describes how to map a bit string to a real number. You can then create a representation where x only lies within an interval [y,z]. In this case, choose y and z to be the of less magnitude than the capacity of the data type you are using (e.g. 10^308 for a double) so you don't run into the overflow issue you describe.
You have to consider that with real-valued exponents and a negative base you will not obtain a real, but a complex number. For example, the Math.Pow implementation in .NET says that you get NaN if you attempt to calculate the power of a negative base to a non-integer exponent. You have to make sure all your x values are positive. I think that's the problem that you're seeing when you "exceed double bounds".
Btw, you can try the HeuristicLab GP implementation. It is very flexible with a configurable grammar.