How to use Apache commons-math RealVector sparse iterators? - apache

I would like to iterate through the non-zero values in a RealVector. I notice that the method RealVector.sparseIterator() can do this:
"Create a sparse iterator over the vector, which may omit some entries. The ommitted entries are either exact zeroes (for dense implementations) or are the entries which are not stored (for real sparse vectors). No guarantees are made about order of iteration."
However, the method returns
Iterator<RealVector.Entry>
object, where "Entry" is a protected class and therefore I cannot really use it outside of the RealVector class.
Have I misunderstood something? Is there anyway to iterate through the values in the RealVector object without converting them to double[], because the vector is very high dimensional and very sparse.
Many thanks!

This is indeed a bug (see https://issues.apache.org/jira/browse/MATH-1329). It will most likely be fixed in the next release.

Related

Why does official prefer concantane than hstack/vstack in Numpy?

I find that the latest documentation about hstack/vstack note that "you should prefer np.concatenate or np.stack".
But I think their readability is better than concatenate(a, 0) or concatenate(a, 1)
All 3 'stack' functions use concatenate (as does np.append and column_stack). It's instructive to look at their code. np.source(np.hstack) for example.
What they all do is massage the dimensions of the input arrays, making sure they are are 1d or 2d etc, and then call concatenate with the appropriate axis. So in the long run it's a good idea to know how to use concatenate without the 'crutch' of the others.
But people will continue to use hstack and vstack where convenient. dstack and column_stack are less common. np.append is frequently misused and should be banished.
I think this 'preferred' note was added when np.stack was added. np.stack also uses concatenate, but in a somewhat more sophisticated way. It inserts a new axis (with expand_dims). I view it as a generalization of np.array. When given a list of matching arrays, np.array joins them on a new initial axis. np.stack does the same thing as a default, but lets us specify a different 'new' axis for concatenation.
I should qualify my answer. It is not official. Rather I'm making an educated guess based on knowledge of the code.

VB.NET "For each" versus ".GetUpperBound(0)"

I would like to know what is preferred...
Dim sLines() As String = s.Split(NewLine)
For each:
For Each sLines_item As String In sLines
.GetUpperBound:
For i As Integer = 0 To sLines.GetUpperBound(0)
I have no idea why the "For Each" was introduced for such cases. Until now I have only used .GetUpperBound, and I don't see any PRO for the "For Each".
Thank you
ps: When I use ."GetUpperBound(0)", I do know that I am iterating over the vector.
The "For Each" in contrast sounds like "I don't care in which order the vector is given to me". But that is just personal gusto, I guess.
Short answer: Do not use GetUpperBound(). The only advantage of GetUpperBound() is that it works for multi-dimensional arrays, where Length doesn't work. However, even that usage is outdated since there is Array.GetLength() available that takes the dimension parameter. For all other uses, For i = 0 to Array.Length - 1 is better and probably the fastest option.
It's largely a personal preference.
If you need to alter the elements of the array, you should use For i ... because changing sLines_item will not affect the corresponding array element.
If you need to delete elements of the array, you can iterate For i = ubound(sLines) to 0 step -1 (or the equivalent).
Short answer
You should always use For Each on IEnumerable types unless you have no other choice.
Long answer
Contrary to the popular understanding, For Each is not a syntactic sugar on top of For Next. It will not necessarily iterate over every element of its source. It is a syntactic sugar on top of IEnumerable.GetEnumerator(). For Each will first get an enumerator to its source then loop until it cannot enumerate further. Basically, it will be replaced by the following code. Keep in mind that this is an oversimplification.
' Ask the source for a way to enumerate its content in a forward only manner.
Dim enumerator As IEnumerator = sLines.GetEnumerator()
' Loop until there is no more element in front of us.
While enumerator.Next() Then
' Invoke back the content of the for each block by passing
' the currently enumerated element.
forEachContent.Invoke(enumerator.Current)
End While
The major difference between this and a classical For Next loop is that it does not depend on any length. This fixes two limitations in modern .NET languages. The first one has to do with the Count method. IEnumerable provides a Count method, but the implementation might not be able to keep track of the actual amount of elements it stores. Because of this, calling IEnumerable.Count might cause the source to be iterated over to actually count the amount of element it contains. Moreover, doing this as the end value for traditional For Next loop will cause this process to be done for every element in the loop. This is very slow. Here is an illustration of this process:
For i As Integer = 0 To source.Count() ' This here will cause Count to be
' evaluated for every element in source.
DoSomething(source(i))
Next
The use of For Each fixes this by never requesting the length of the source.
The second limitation it fixes is the lack of a concept for arrays with infinite amount of elements. An example of such cases would be an array containing every digit of PI where each digit is only calculated when you request them. This is where LINQ makes its entrance and really shines because it enables you to write the following code:
Dim piWith10DigitPrecision = From d In InfinitePiSource
Take 10
Dim piWith250DigitPrecision = From d In InfinitePiSource
Take 250
Dim infite2PiSource = From d In InfinitePiSource
Select d * 2
Now, in an infinite source, you cannot depend on a length to iterate over all of its elements. It has an infinite length thus making a traditional For Next loop an infinite loop. This does not change anything for the first two examples I have given with pi because we explicitly provides the amount of elements we want, but it does for the third one. When would you stop iterating? For Each, when combined with Yield (used by the Take operator), makes sure that you never iterate until you actually requests a specific value.
You might have already figured it out by now but these two things means that For Each effectively have no concept of bounds because it simply does not require them. The only use for GetLowerBound and GetUpperBound are for non-zero-indexed arrays. For instance, you might have an array that indexes values from 1 instead of zero. Even then, you only need GetLowerBound and Length. Obviously, this is only if the position of the element in the source actually matters. If it does not, you can still use For Each to iterate over all elements as it is bound agnostic.
Also, as already mentioned, GetLength should be used for zero-indexed multi-dimensional arrays, again, only if the position of the element matters and not just the element itself.

How to write additional methods in Smalltalk Collections which work only for Numeric Inputs?

I want to add a method "average" to array class.
But average doesn't make any sense if input array contains characters/strings/objects.
So I need to check if array contains only integers/floats.
Smalltalk says datatype check [checking if variable belongs to a particular datatype like int string array etc... or not] is a bad way of programming.
So what is best way to implement this?
The specification is somewhat incomplete. You'd need to specify what behavior the collection should show when you use it with non-numeric input.
There are a huge number of possibly desirable behaviors. Smalltalk supports most of them, except for the static typing solution (throw a compile-time error when you add a non-numeric thing to a numeric collection).
If you want to catch non-numeric objects as late as possible, you might just do nothing - objects without arithmetic methods will signal their own exceptions when you try arithmetic on them.
If you want to catch non-numeric elements early, implement a collection class which ensures that only numeric objects can be added (probably by signaling an exception when you add a non-numeric object is added).
You might also want to implement "forgiving" methods for sum or average that treat non-numeric objects as either zero-valued or non-existing (does not make a difference for #sum, but for #average you would only count the numeric objects).
In pharo at least there is
Collection >> average
^ self sum / self size
In Collections-arithmetic category. When you work with with a statically typed languages you are being hit by the language when you add non-number values to the collection. In dynamically typed languages you the same happens when you try to calculate average of inappropriate elements e.i. you try to send +, - or / to an object that does not understand it.
Don't think where you put data, think what are you doing with it.
It's reasonable to check type if you want to do different things, e.g.:
(obj isKindOf: Number) ifTrue: [:num| num doItForNum].
(obj isKindOf: Array ) ifTrue: [:arr| arr doItForArr].
But in this case you want to move the logic of type checking into the object-side.
So in the end it will be just:
obj doIt.
and then you'll have also something like:
Number >> doIt
"do something for number"
Array >> doIt
"do something for array"
(brite example of this is printOn: method)
I would have thought the Smalltalk answer would be to implement it for numbers, then be mindful not to send a collection of pets #sum or #average. Of course, if there later becomes a useful implementation for a pet to add itself to another pet or even an answer to #average, then that would be up to the implementer of Pet or PetCollection.
I did a similar thing when I implemented trivial algebra into my image. It allowed me to mix numbers, strings, and symbols in simple math equations. 2 * #x result in 2x. x + y resulted in x + y. It's a fun way to experiment with currencies by imagining algebra happening in your wallet. Into my walled I deposit (5 x #USD) + (15 * #CAN) for 5USD + 15CAN. Given an object that converts between currencies I can then answer what the total is in either CAN or USD.
We actually used it for supply-chain software for solving simple weights and measures. If a purchase order says it will pay XUSD/1TON of something, but the supplier sends foot-lbs of that same thing, then to verify the shipment value we need a conversion between ton and foot-lbs. Letting the library reduce the equation we're able to produce a result without molesting the input data, or without having to come up with new objects representing tons and foot-pounds or anything else.
I had high ambitions for the library (it was pretty simple) but alas, 2008 erased the whole thing...
"I want to add a method "average" to array class. But average doesn't make any sense if input array contains characters/strings/objects. So I need to check if array contains only integers/floats."
There are many ways to accomplish the averaging of the summation of numbers in an Array while filtering out non-numeric objects.
First I'd make it a more generic method by lifting it up to the Collection class so it can find more cases of reuse. Second I'd have it be generic for numbers rather than just floats and integers, oh it'll work for those but also for fractions. The result will be a float average if there are numbers in the collection array list.
(1) When adding objects to the array test them to ensure they are numbers and only add them if they are numbers. This is my preferred solution.
(2) Use the Collection #select: instance method to filter out the non-numbers leaving only the numbers in a separate collection. This makes life easy at the cost of a new collection (which is fine unless you're concerned with large lists and memory issues). This is highly effective, easy to do and a common solution for filtering collections before performing some operation on them. Open up a Smalltalk and find all the senders of #select: to see other examples.
| list numberList sum average |
list := { 100. 50. 'string'. Object new. 1. 90. 2/3. 88. -74. 'yup' }.
numberList := list select: [ :each | each isNumber ].
sum := numberList sum.
average := sum / (numberList size) asFloat.
Executing the above code with "print it" will produce the following for the example array list:
36.523809523809526
However if the list of numbers is of size zero, empty in other words then you'll get a divide by zero exception with the above code. Also this version isn't on the Collection class as an instance method.
(3) Write an instance method for the Collection class to do your work of averaging for you. This solution doesn't use the select since that creates intermediate collections and if your list is very large that's a lot of extra garbage to collect. This version merely loops over the existing collection tallying the results. Simple, effective. It also addresses the case where there are no numbers to tally in which case it returns the nil object rather than a numeric average.
Collection method: #computeAverage
"Compute the average of all the numbers in the collection. If no numbers are present return the nil object to indicate so, otherwise return the average as a floating point number."
| sum count average |
sum := 0.
count := 0.
self do: [ :each |
each isNumber ifTrue: [
count := count +1.
sum := sum + each.
]
].
count > 0 ifTrue: [
^average := sum / count asFloat
] ifFalse: [
^nil
]
Note the variable "average" is just used to show the math, it's not actually needed.
You then use the above method as follows:
| list averageOrNil |
list := { 100. 50. 'string'. Object new. 1. 90. 2/3. 88. -74. 'yup' }.
averageOrNil := list computeAverage.
averageOrNil ifNotNil: [ "got the average" ] ifNil: [ "there were no numbers in the list"
Or you can use it like so:
{
100. 50. 'string'. Object new. 1. 90. 2/3. 88. -74. 'yup'
} computeAverage
ifNotNil: [:average |
Transcript show: 'Average of list is: ', average printString
]
ifNil: [Transcript show: 'No numbers to average' ].
Of course if you know for sure that there are numbers in the list then you won't ever get the exceptional case of the nil object and you won't need to use an if message to branch accordingly.
Data Type/Class Checking At Runtime
As for the issue you raise, "Smalltalk says datatype check [checking if variable belongs to a particular datatype like int string array etc... or not] is a bad way of programming", there are ways to do things that are better than others.
For example, while one can use #isKindOf: Number to ask each element if it's not the best way to determine the "type" or "class" at runtime since it locks it in via predetermined type or class as a parameter to the #isKindOf: message.
It's way better to use an "is" "class" method such as #isNumber so that any class that is a number replies true and all other objects that are not numeric returns false.
A main point of style in Smalltalk when it comes to ascertaining the types or classes of things is that it's best to use message sending with a message that the various types/classes comprehend but behave differently rather than using explicit type/class checking if at all possible.
The method #isNumber is an instance method on the Number class in Pharo Smalltalk and it returns true while on the Object instance version it returns false.
Using polymorphic message sends in this away enables more flexibility and eliminates code that is often too procedural or too specific. Of course it's best to avoid doing this but reality sets in in various applications and you have to do the best that you can.
This is not the kind of thing you do in Smalltalk. You could take suggestions from the above comments and "make it work" but the idea is misguided (from a Smalltalk point of view).
The "Smalltalk" thing to do would be to make a class that could perform all such operations for you --computing the average, mean, mode, etc. The class could then do the proper checking for numerical inputs, and you could write how it would respond to bad input. The class would use a plain old array, or list or something. The name of the class would make it clear what it's usage would be for. The class could then be part of your deployment and could be exported/imported to different images as needed.
Make a new collection class; perhaps a subclass of Array, or perhaps of OrderedCollection, depending on what collection related behaviour you want.
In the new class' at:put: and/or add: methods test the new item for #isNumber and return an error if it fails.
Now you have a collection you can guarantee will have just numeric objects and nils. Implement your required functions in the knowledge that you won't need to deal with trying to add a Sealion to a Kumquat. Take care with details though; for example if you create a WonderNumericArray of size 10 and insert two values into it, when you average the array do you want to sum the two items and divide by two or by ten?

What does "in-place" mean?

Reverse words in a string (words are
separated by one or more spaces). Now
do it in-place.
What does in-place mean?
In-place means that you should update the original string rather than creating a new one.
Depending on the language/framework that you're using this could be impossible. (For example, strings are immutable in .NET and Java, so it would be impossible to perform an in-place update of a string without resorting to some evil hacks.)
In-place algorithms can only use O(1) extra space, essentially. Array reversal (essentially what the interview question boils down to) is a classic example. The following is taken from Wikipedia:
Suppose we want to reverse an array of n items. One simple way to do this is:
function reverse(a[0..n])
allocate b[0..n]
for i from 0 to n
b[n - i] = a[i]
return b
Unfortunately, this requires O(n) extra space to create the array b, and allocation is often a slow operation. If we no longer need a, we can instead overwrite it with its own reversal using this in-place algorithm:
function reverse-in-place(a[0..n])
for i from 0 to floor(n/2)
swap(a[i], a[n-i])
Sometimes doing something in-place is VERY HARD. A classic example is general non-square matrix transposition.
See also
In-place algorithm
In-place matrix transposition
You should change the content of the original string to the reverse without using a temporary storage variable to hold the string.

can a variable have multiple values

In algebra if I make the statement x + y = 3, the variables I used will hold the values either 2 and 1 or 1 and 2. I know that assignment in programming is not the same thing, but I got to wondering. If I wanted to represent the value of, say, a quantumly weird particle, I would want my variable to have two values at the same time and to have it resolve into one or the other later. Or maybe I'm just dreaming?
Is it possible to say something like i = 3 or 2;?
This is one of the features planned for Perl 6 (junctions), with syntax that should look like my $a = 1|2|3;
If ever implemented, it would work intuitively, like $a==1 being true at the same time as $a==2. Also, for example, $a+1 would give you a value of 2|3|4.
This feature is actually available in Perl5 as well through Perl6::Junction and Quantum::Superpositions modules, but without the syntax sugar (through 'functions' all and any).
At least for comparison (b < any(1,2,3)) it was also available in Microsoft Cω experimental language, however it was not documented anywhere (I just tried it when I was looking at Cω and it just worked).
You can't do this with native types, but there's nothing stopping you from creating a variable object (presuming you are using an OO language) which has a range of values or even a probability density function rather than an actual value.
You will also need to define all the mathematical operators between your variables and your variables and native scalars. Same goes for the equality and assignment operators.
numpy arrays do something similar for vectors and matrices.
That's also the kind of thing you can do in Prolog. You define rules that constraint your variables and then let Prolog resolve them ...
It takes some time to get used to it, but it is wonderful for certain problems once you know how to use it ...
Damien Conways Quantum::Superpositions might do what you want,
https://metacpan.org/pod/Quantum::Superpositions
You might need your crack-pipe however.
What you're asking seems to be how to implement a Fuzzy Logic system. These have been around for some time and you can undoubtedly pick up a library for the common programming languages quite easily.
You could use a struct and handle the operations manualy. Otherwise, no a variable only has 1 value at a time.
A variable is nothing more than an address into memory. That means a variable describes exactly one place in memory (length depending on the type). So as long as we have no "quantum memory" (and we dont have it, and it doesnt look like we will have it in near future), the answer is a NO.
If you want to program and to modell this behaviour, your way would be to use a an array (with length equal to the number of max. multiple values). With this comes the increased runtime, hence the computations must be done on each of the values (e.g. x+y, must compute with 2 different values x1+y1, x2+y2, x1+y2 and x2+y1).
In Perl , you can .
If you use Scalar::Util , you can have a var take 2 values . One if it's used in string context , and another if it's used in a numerical context .