What does "in-place" mean? - semantics

Reverse words in a string (words are
separated by one or more spaces). Now
do it in-place.
What does in-place mean?

In-place means that you should update the original string rather than creating a new one.
Depending on the language/framework that you're using this could be impossible. (For example, strings are immutable in .NET and Java, so it would be impossible to perform an in-place update of a string without resorting to some evil hacks.)

In-place algorithms can only use O(1) extra space, essentially. Array reversal (essentially what the interview question boils down to) is a classic example. The following is taken from Wikipedia:
Suppose we want to reverse an array of n items. One simple way to do this is:
function reverse(a[0..n])
allocate b[0..n]
for i from 0 to n
b[n - i] = a[i]
return b
Unfortunately, this requires O(n) extra space to create the array b, and allocation is often a slow operation. If we no longer need a, we can instead overwrite it with its own reversal using this in-place algorithm:
function reverse-in-place(a[0..n])
for i from 0 to floor(n/2)
swap(a[i], a[n-i])
Sometimes doing something in-place is VERY HARD. A classic example is general non-square matrix transposition.
See also
In-place algorithm
In-place matrix transposition

You should change the content of the original string to the reverse without using a temporary storage variable to hold the string.

Related

Referencing nested arrays in awk

I'm creating a bunch of mappings that can be indexed into using 3 keys such as below:
mappings["foo"]["bar"]["blah"][1]=0
split( "10,13,19,49", mappings["foo"]["bar"]["blah"] )
I can then index into the nested array using for example
mappings[product][format][version][i]
But this is a bit long-winded when I need to refer to the same nested array several times, so in other languages I'd create a reference to the inner array:
map=mappings[product][format][version]
map[i]
However, I can't seem to get this to work in awk (gawk 4.1.3).
I can only find one link over google, that suggests this is impossible in previous versions of awk, and a loop setting the keys and values one-by-one is the only solution. Is this still the case or does anyone have a suggestions for a better solution?
https://developer.apple.com/library/archive/documentation/OpenSource/Conceptual/ShellScripting/Howawk-ward/Howawk-ward.html
EDIT
In response to comments a bit more background on what I'm trying to do. If there is a better approach, I'm all for using it!
I have set of CSV files that I'm feeding into AWK. The idea is to calculate a checksum based on specific columns after applying filtering to the rows.
The columns to checksum on, and the filtering to apply, are derivived from runtime parameters sent into the script.
The runtime parameters are a triple of (product,format,version), hence my use of a 3-nested assoicative array.
Another approach would be to use triple as a single key, rather than nesting, but gawk doesn't seem to natively support this, so I'd end-up concatenating the values as string. This felt a bit less structured to me, but if I'm wrong, happy to change my mind on this apporach.
Anyway, it is these parameters that are used to index into the array to structure to retrieve the column numbers, etc.
You can then build-up a tree-like structure, for example, the below shows 2 formats for product foo on version blah, and so on...:
mappings["product-foo"]["format-bar"]["version-blah"][1]=0
split( "10,13,19,49", mappings["product-foo"]["format-bar"]["version-blah"] )
mappings["product-foo"]["format-moo"]["version-blah"][1]=0
split( "55,23,14,6", mappings["product-foo"]["format-moo"]["version-blah"] )
The magic happens like this, you can see how long-winded the mappings indexing becomes without referencing:
(FNR>1 && (format!="some-format" ||
(version=="some-version" && $1=="some-filter") ||
(version=="some-other-version" && $8=="some-other-filter"))) {
# Loop over each supplied field summing an absolute tally for each
for (i=1; i <= length(mappings[product][format][version]); i++) {
sumarr[i] += ( $mappings[product][format][version][i] < 0 ? -$mappings[product][format][version][i]:$mappings[product][format][version][i] )
}
}
The comment from #ed-morton simplifies this as originally requested, but interested if their is a simpler approach.
The right answer is from #ed-morton above (thanks!).
Ed - if you write it out as an answer I'll accept it, otherwise I'll accept this quote in a few days for good housekeeping.
Right, there is no array copy functionality in awk and there are no pointers/references so you can't create a pointer to an array. You can of course create function map(i) { return mappings[product][format][version][i]}

How to use Apache commons-math RealVector sparse iterators?

I would like to iterate through the non-zero values in a RealVector. I notice that the method RealVector.sparseIterator() can do this:
"Create a sparse iterator over the vector, which may omit some entries. The ommitted entries are either exact zeroes (for dense implementations) or are the entries which are not stored (for real sparse vectors). No guarantees are made about order of iteration."
However, the method returns
Iterator<RealVector.Entry>
object, where "Entry" is a protected class and therefore I cannot really use it outside of the RealVector class.
Have I misunderstood something? Is there anyway to iterate through the values in the RealVector object without converting them to double[], because the vector is very high dimensional and very sparse.
Many thanks!
This is indeed a bug (see https://issues.apache.org/jira/browse/MATH-1329). It will most likely be fixed in the next release.

VB.NET "For each" versus ".GetUpperBound(0)"

I would like to know what is preferred...
Dim sLines() As String = s.Split(NewLine)
For each:
For Each sLines_item As String In sLines
.GetUpperBound:
For i As Integer = 0 To sLines.GetUpperBound(0)
I have no idea why the "For Each" was introduced for such cases. Until now I have only used .GetUpperBound, and I don't see any PRO for the "For Each".
Thank you
ps: When I use ."GetUpperBound(0)", I do know that I am iterating over the vector.
The "For Each" in contrast sounds like "I don't care in which order the vector is given to me". But that is just personal gusto, I guess.
Short answer: Do not use GetUpperBound(). The only advantage of GetUpperBound() is that it works for multi-dimensional arrays, where Length doesn't work. However, even that usage is outdated since there is Array.GetLength() available that takes the dimension parameter. For all other uses, For i = 0 to Array.Length - 1 is better and probably the fastest option.
It's largely a personal preference.
If you need to alter the elements of the array, you should use For i ... because changing sLines_item will not affect the corresponding array element.
If you need to delete elements of the array, you can iterate For i = ubound(sLines) to 0 step -1 (or the equivalent).
Short answer
You should always use For Each on IEnumerable types unless you have no other choice.
Long answer
Contrary to the popular understanding, For Each is not a syntactic sugar on top of For Next. It will not necessarily iterate over every element of its source. It is a syntactic sugar on top of IEnumerable.GetEnumerator(). For Each will first get an enumerator to its source then loop until it cannot enumerate further. Basically, it will be replaced by the following code. Keep in mind that this is an oversimplification.
' Ask the source for a way to enumerate its content in a forward only manner.
Dim enumerator As IEnumerator = sLines.GetEnumerator()
' Loop until there is no more element in front of us.
While enumerator.Next() Then
' Invoke back the content of the for each block by passing
' the currently enumerated element.
forEachContent.Invoke(enumerator.Current)
End While
The major difference between this and a classical For Next loop is that it does not depend on any length. This fixes two limitations in modern .NET languages. The first one has to do with the Count method. IEnumerable provides a Count method, but the implementation might not be able to keep track of the actual amount of elements it stores. Because of this, calling IEnumerable.Count might cause the source to be iterated over to actually count the amount of element it contains. Moreover, doing this as the end value for traditional For Next loop will cause this process to be done for every element in the loop. This is very slow. Here is an illustration of this process:
For i As Integer = 0 To source.Count() ' This here will cause Count to be
' evaluated for every element in source.
DoSomething(source(i))
Next
The use of For Each fixes this by never requesting the length of the source.
The second limitation it fixes is the lack of a concept for arrays with infinite amount of elements. An example of such cases would be an array containing every digit of PI where each digit is only calculated when you request them. This is where LINQ makes its entrance and really shines because it enables you to write the following code:
Dim piWith10DigitPrecision = From d In InfinitePiSource
Take 10
Dim piWith250DigitPrecision = From d In InfinitePiSource
Take 250
Dim infite2PiSource = From d In InfinitePiSource
Select d * 2
Now, in an infinite source, you cannot depend on a length to iterate over all of its elements. It has an infinite length thus making a traditional For Next loop an infinite loop. This does not change anything for the first two examples I have given with pi because we explicitly provides the amount of elements we want, but it does for the third one. When would you stop iterating? For Each, when combined with Yield (used by the Take operator), makes sure that you never iterate until you actually requests a specific value.
You might have already figured it out by now but these two things means that For Each effectively have no concept of bounds because it simply does not require them. The only use for GetLowerBound and GetUpperBound are for non-zero-indexed arrays. For instance, you might have an array that indexes values from 1 instead of zero. Even then, you only need GetLowerBound and Length. Obviously, this is only if the position of the element in the source actually matters. If it does not, you can still use For Each to iterate over all elements as it is bound agnostic.
Also, as already mentioned, GetLength should be used for zero-indexed multi-dimensional arrays, again, only if the position of the element matters and not just the element itself.

When to use StringBuilder? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
String vs StringBuilder
I just revisited some of the books that I used to pick up VB.NET. I am not sure I've got this in my head, understand how/what StringBuilder is.
What is the guidance for using? Is it best to use it if you are are concatenating 2 strings or 50?
Or when the the total string length is greater than 128 characters?
Or will you see a performance benefit whenever you use it to add strings together?
In which case is it better to use a StringBuilder instance to build a SQL statement than string.format("Select * from x where y = {0}",1)?
It's always struck me that declaring another variable and including a name space is not beneficial for small string concatenations, but I am not sure now.
Sorry, lot of documentation tells you what to use, just not what's best.
I've got an article on this very topic. In summary (copied from the bottom of the page):
Definitely use StringBuilder when you're concatenating in a non-trivial loop - especially if you don't know for sure (at compile time) how many iterations you'll make through the loop. For example, reading a file a character at a time, building up a string as you go using the += operator is potentially performance suicide.
Definitely use the concatenation operator when you can (readably) specify everything which needs to be concatenated in one statement. (If you have an array of things to concatenate, consider calling String.Concat explicitly - or String.Join if you need a delimiter.)
Don't be afraid to break literals up into several concatenated bits - the result will be the same. You can aid readability by breaking a long literal into several lines, for instance, with no harm to performance.
If you need the intermediate results of the concatenation for something other than feeding the next iteration of concatenation, StringBuilder isn't going to help you. For instance, if you build up a full name from a first name and a last name, and then add a third piece of information (the nickname, maybe) to the end, you'll only benefit from using StringBuilder if you don't need the (first name + last name) string for other purpose (as we do in the example which creates a Person object).
If you just have a few concatenations to do, and you really want to do them in separate statements, it doesn't really matter which way you go. Which way is more efficient will depend on the number of concatenations the sizes of string involved, and what order they're concatenated in. If you really believe that piece of code to be a performance bottleneck, profile or benchmark it both ways.
Here is my rule of thumb:
StringBuilder is best used when the exact number of concatenations is unknown at compile time.
Coding Horror has a good article concerning this question, The Sad Tragedy of Micro-Optimization Theater.
Personally I use StringBuilder when I have more than just one or two strings to concatenate. I'm not sure if there's a real performance hit to be gained, but I've always read and been told that doing a regular concatenation with multiple strings creates an extra copy of the string each time you do it, while using StringBuilder keeps one copy until you call the final ToString() method on it.
Someone's figured out experimentally that the critical number is 6. More than 6 concatenations in a row and you should use a StringBuilder. Can't remember where I found this.
However, note that if you just write this in a line:
"qwert" + "yuiop" + "asdf" + "gh" + "jkl;" + "zxcv" + "bnm" + ",."
That gets converted into one function call (I don't know how to write it in VB.net)
String.Concat("qwert", "yuiop", "asdf", "gh", "jkl;", "zxcv", "bnm", ",.");
So if you're doing all concatenations on one line, then don't bother with StringBuilder because String.Concat effectively will do all the concatenations in one go. It's only if you're doing them in a loop or successively concatenating.
My rule - when you're adding to a string in a For or Foreach loop, use the StringBuilder.

can a variable have multiple values

In algebra if I make the statement x + y = 3, the variables I used will hold the values either 2 and 1 or 1 and 2. I know that assignment in programming is not the same thing, but I got to wondering. If I wanted to represent the value of, say, a quantumly weird particle, I would want my variable to have two values at the same time and to have it resolve into one or the other later. Or maybe I'm just dreaming?
Is it possible to say something like i = 3 or 2;?
This is one of the features planned for Perl 6 (junctions), with syntax that should look like my $a = 1|2|3;
If ever implemented, it would work intuitively, like $a==1 being true at the same time as $a==2. Also, for example, $a+1 would give you a value of 2|3|4.
This feature is actually available in Perl5 as well through Perl6::Junction and Quantum::Superpositions modules, but without the syntax sugar (through 'functions' all and any).
At least for comparison (b < any(1,2,3)) it was also available in Microsoft Cω experimental language, however it was not documented anywhere (I just tried it when I was looking at Cω and it just worked).
You can't do this with native types, but there's nothing stopping you from creating a variable object (presuming you are using an OO language) which has a range of values or even a probability density function rather than an actual value.
You will also need to define all the mathematical operators between your variables and your variables and native scalars. Same goes for the equality and assignment operators.
numpy arrays do something similar for vectors and matrices.
That's also the kind of thing you can do in Prolog. You define rules that constraint your variables and then let Prolog resolve them ...
It takes some time to get used to it, but it is wonderful for certain problems once you know how to use it ...
Damien Conways Quantum::Superpositions might do what you want,
https://metacpan.org/pod/Quantum::Superpositions
You might need your crack-pipe however.
What you're asking seems to be how to implement a Fuzzy Logic system. These have been around for some time and you can undoubtedly pick up a library for the common programming languages quite easily.
You could use a struct and handle the operations manualy. Otherwise, no a variable only has 1 value at a time.
A variable is nothing more than an address into memory. That means a variable describes exactly one place in memory (length depending on the type). So as long as we have no "quantum memory" (and we dont have it, and it doesnt look like we will have it in near future), the answer is a NO.
If you want to program and to modell this behaviour, your way would be to use a an array (with length equal to the number of max. multiple values). With this comes the increased runtime, hence the computations must be done on each of the values (e.g. x+y, must compute with 2 different values x1+y1, x2+y2, x1+y2 and x2+y1).
In Perl , you can .
If you use Scalar::Util , you can have a var take 2 values . One if it's used in string context , and another if it's used in a numerical context .