When merging runs A,B in Timsort (in function merge_lo) it says "Must also have that ssa.keys[na-1] belongs at the end of the merge". Why? - timsort

A and B are adjacent runs on a stack, with A being the bottom and smaller run(If B were smaller merge_hi would be performing the merging but the same question applies there as well).I have been trying to figure why the last element of A MUST be bigger than the last element of B because I don't see how the run decomposition (or the rest of the algorithm) would ensure that condition. Also, in the same function the code seems to suggest that the first element of B is always smaller than the first element of A which I also don't understand why, but I'm guessing the answer to this question is tied to the answer of the first question.

In short, galloping is the reason. That's why we haven't seen it at first, because I thought gallop_{left,right} is called only from merge_{lo,hi}. But it isn't true. gallop_... are also called from merge_at, before merge_{lo,hi} are called, in order to find "Where does b start in a?" and "Where does a end in b?". Those calls (and subsequent code) change ssb (and its length nb), and also ssa and its length na, such that the invariant in the title is satisfied.
That is, the point is that A and B are not "adjacent runs on the runstack" as found in the original list to be sorted. Before calling merge_{lo,hi}, they are trimmed so that the condition in the title holds. That is, prior to A being merged with B, the elements of B greater than the last element of A are explicitly left out of consideration. In other words, "Must also have that ssa.keys[na-1] belongs at the end of the merge" is true not because of some special property of ssa.keys[na-1], but because we have defined "the end of the merge" in such a way. :-)
That also means that when you implement Timsort yourself, and you leave out the galloping, you must also leave out that optimization being talked about here, otherwise the code won't work right.

Related

Unable to overwrite a Column Value using Pandas

I'm planning to overwrite a Field value using pandas but that does not seem to work. Am i missing anything as part of the code below?
`for row_no in range(df.shape[0]):
rowIndex = df.index[row_no]
if re.search('Fex|Process|PIP|VIP|Generic|Mobility', df.loc[rowIndex].VPC_Sub_Cat, re.I):
print(df.loc[rowIndex].Headline)
print(df.loc[rowIndex].VPC_Sub_Cat)
print(df.loc[rowIndex].Final_Result)
df.loc[rowIndex].Final_Result = 0
print(df.loc[rowIndex].Final_Result)
break`
The output that I get after running this piece of code is:
This is the description of the issue...
VPC-Generic
1
1
Also can i achieve the same thing using a function and applying that on a data frame? kindly let me know.
df.loc[rowIndex].Final_Result is equals to (in most situation though...)
df.loc[rowIndex]['Final_Result']
This will cause a chained assignment (see here #Warning)
Whether a copy or a reference is returned for a setting operation, may depend on the context. This is sometimes called chained assignment and should be avoided. See Returning a View versus Copy.
And then from Returning a view versus a copy
But it turns out that assigning to the product of chained indexing has inherently unpredictable results.
So using df.loc[rowIndex, 'Final_Result'] to make sure that the value you assigned is view, not copy.

SonarLint - questions about some of the rules for VB.NET

The large majority of SonarLint rules that I've come across in Java seemed plausible and justified. However, ever since I've started using SonarLint for VB.NET, I've come across several rules that left me questioning their usefulness or even whether or not they are working correctly.
I'd like to know if this is simply a problem of me using some VB.NET constructs in a suboptimal way or whether the rule really is flawed.
(Apologies if this question is a little longer. I didn't know if I should create a separate question for each individual rule.)
The following rules I found to leave some cases unconsidered that would actually turn up as false-positives:
S1871: Two branches in the same conditional structure should not have exactly the same implementation
I found this one to bring up a lot of false-positives for me, because sometimes the order in which the conditions are checked actually does matter. Take the following pseudo code as example:
If conditionA() Then
doSomething()
ElseIf conditionB() AndAlso conditionC() Then
doSomethingElse()
ElseIf conditionD() OrElse conditionE() Then
doYetAnotherThing()
'... feel free to have even more cases in between here
Else Then
doSomething() 'Non-compliant
End If
If I wanted to follow this Sonar rule and still make the code behave the same way, I'd have to add the negated version of each ElseIf-condition to the first If-condition.
Another example would be the following switch:
Select Case i
Case 0 To 40
value = 0
Case 41 To 60
value = 1
Case 61 To 80
value = 3
Case 81 To 100
value = 5
Case Else
value = 0 'Non-compliant
There shouldn't be anything wrong with having that last case in a switch. True, I could have initialized value beforehand to 0 and ignored that last case, but then I'd have one more assignment operation than necessary. And the Java ruleset has conditioned me to always put a default case in every switch.
S1764: Identical expressions should not be used on both sides of a binary operator
This rule does not seem to take into account that some functions may return different values every time you call them, for instance collections where accessing an element removes it from the collection:
stack.Push(stack.Pop() / stack.Pop()) 'Non-compliant
I understand if this is too much of an edge case to make special exceptions for it, though.
The following rules I am not actually sure about:
S3385: "Exit" statements should not be used
While I agree that Return is more readable than Exit Sub, is it really bad to use a single Exit For to break out of a For or a For Each loop? The SonarLint rule for Java permits the use of a single break; in a loop before flagging it as an issue. Is there a reason why the default in VB.NET is more strict in that regard? Or is the rule built on the assumption that you can solve nearly all your loop problems with LINQ extension methods and lambdas?
S2374: Signed types should be preferred to unsigned ones
This rule basically states that unsigned types should not be used at all because they "have different arithmetic operators than signed ones - operators that few developers understand". In my code I am only using UInteger for ID values (because I don't need negative values and a Long would be a waste of memory in my case). They are stored in List(Of UInteger) and only ever compared to other UIntegers. Is this rule even relevant to my case (are comparisons part of these "arithmetic operators" mentioned by the rule) and what exactly would be the pitfall? And if not, wouldn't it be better to make that rule apply to arithmetic operations involving unsigned types, rather than their declaration?
S2355: Array literals should be used instead of array creation expressions
Maybe I don't know VB.NET well enough, but how exactly would I satisfy this rule in the following case where I want to create a fixed-size array where the initialization length is only known at runtime? Is this a false-positive?
Dim myObjects As Object() = New Object(someOtherList.Count - 3) {} 'Non-compliant
Sure, I could probably just use a List(Of Object). But I am curious anyway.
Thanks for raising these points. Note that not all rules apply every time. There are cases when we need to balance between false positives/false negatives/real cases. For example with identical expressions on both sides of an operator rule. Is it a bug to have the same operands? No it's not. If it was, then the compiler would report it. Is it a bad smell, is it usually a mistake? Yes in many cases. See this for example in Roslyn. Should we tune this rule to exclude some cases? Yes we should, there's nothing wrong with 2 << 2. So there's a lot of balancing that needs to happen, and we try to settle for an implementation that brings the most value for the users.
For the points you raised:
Two branches in the same conditional structure should not have exactly the same implementation
This rule generally states that having two blocks of code match exactly is a bad sign. Copy-pasted code should be avoided for many reasons, for example if you need to fix the code in one place, you'll need to fix it in the other too. You're right that adding negated conditions would be a mess, but if you extract each condition into its own method (and call the negated methods inside them) with proper names, then it would probably improves the readability of your code.
For the Select Case, again, copy pasted code is always a bad sign. In this case you could do this:
Select Case i
...
Case 0 To 40
Case Else
value = 0 ' Compliant
End Select
Or simply remove the 0-40 case.
Identical expressions should not be used on both sides of a binary operator
I think this is a corner case. See the first paragraph of the answer.
"Exit" statements should not be used
It's almost always true that by choosing another type of loop, or changing the stop condition, you can get away without using any "Exit" statements. It's good practice to have a single exit point from loops.
Signed types should be preferred to unsigned ones
This is a legacy rule from SonarQube VB.NET, and I agree with you that it shouldn't be enabled by default in SonarLint. I created the following ticket in our JIRA: https://jira.sonarsource.com/browse/SLVS-1074
Array literals should be used instead of array creation expressions
Yes, it seems to be a false positive, we shouldn't report on array creations when the size is explicitly specified. https://jira.sonarsource.com/browse/SLVS-1075

VB.NET "For each" versus ".GetUpperBound(0)"

I would like to know what is preferred...
Dim sLines() As String = s.Split(NewLine)
For each:
For Each sLines_item As String In sLines
.GetUpperBound:
For i As Integer = 0 To sLines.GetUpperBound(0)
I have no idea why the "For Each" was introduced for such cases. Until now I have only used .GetUpperBound, and I don't see any PRO for the "For Each".
Thank you
ps: When I use ."GetUpperBound(0)", I do know that I am iterating over the vector.
The "For Each" in contrast sounds like "I don't care in which order the vector is given to me". But that is just personal gusto, I guess.
Short answer: Do not use GetUpperBound(). The only advantage of GetUpperBound() is that it works for multi-dimensional arrays, where Length doesn't work. However, even that usage is outdated since there is Array.GetLength() available that takes the dimension parameter. For all other uses, For i = 0 to Array.Length - 1 is better and probably the fastest option.
It's largely a personal preference.
If you need to alter the elements of the array, you should use For i ... because changing sLines_item will not affect the corresponding array element.
If you need to delete elements of the array, you can iterate For i = ubound(sLines) to 0 step -1 (or the equivalent).
Short answer
You should always use For Each on IEnumerable types unless you have no other choice.
Long answer
Contrary to the popular understanding, For Each is not a syntactic sugar on top of For Next. It will not necessarily iterate over every element of its source. It is a syntactic sugar on top of IEnumerable.GetEnumerator(). For Each will first get an enumerator to its source then loop until it cannot enumerate further. Basically, it will be replaced by the following code. Keep in mind that this is an oversimplification.
' Ask the source for a way to enumerate its content in a forward only manner.
Dim enumerator As IEnumerator = sLines.GetEnumerator()
' Loop until there is no more element in front of us.
While enumerator.Next() Then
' Invoke back the content of the for each block by passing
' the currently enumerated element.
forEachContent.Invoke(enumerator.Current)
End While
The major difference between this and a classical For Next loop is that it does not depend on any length. This fixes two limitations in modern .NET languages. The first one has to do with the Count method. IEnumerable provides a Count method, but the implementation might not be able to keep track of the actual amount of elements it stores. Because of this, calling IEnumerable.Count might cause the source to be iterated over to actually count the amount of element it contains. Moreover, doing this as the end value for traditional For Next loop will cause this process to be done for every element in the loop. This is very slow. Here is an illustration of this process:
For i As Integer = 0 To source.Count() ' This here will cause Count to be
' evaluated for every element in source.
DoSomething(source(i))
Next
The use of For Each fixes this by never requesting the length of the source.
The second limitation it fixes is the lack of a concept for arrays with infinite amount of elements. An example of such cases would be an array containing every digit of PI where each digit is only calculated when you request them. This is where LINQ makes its entrance and really shines because it enables you to write the following code:
Dim piWith10DigitPrecision = From d In InfinitePiSource
Take 10
Dim piWith250DigitPrecision = From d In InfinitePiSource
Take 250
Dim infite2PiSource = From d In InfinitePiSource
Select d * 2
Now, in an infinite source, you cannot depend on a length to iterate over all of its elements. It has an infinite length thus making a traditional For Next loop an infinite loop. This does not change anything for the first two examples I have given with pi because we explicitly provides the amount of elements we want, but it does for the third one. When would you stop iterating? For Each, when combined with Yield (used by the Take operator), makes sure that you never iterate until you actually requests a specific value.
You might have already figured it out by now but these two things means that For Each effectively have no concept of bounds because it simply does not require them. The only use for GetLowerBound and GetUpperBound are for non-zero-indexed arrays. For instance, you might have an array that indexes values from 1 instead of zero. Even then, you only need GetLowerBound and Length. Obviously, this is only if the position of the element in the source actually matters. If it does not, you can still use For Each to iterate over all elements as it is bound agnostic.
Also, as already mentioned, GetLength should be used for zero-indexed multi-dimensional arrays, again, only if the position of the element matters and not just the element itself.

VBA Overflow Error 6 -- simply when terminating a class

I am trying to debug code that typically functions properly and has been in production for a while, but has some errors associated with particular case runs.
I have a class called "Guarantee", and a variable / object, "myDem", which is of that class. The class has an associated function called "NumberOfGuaranteeDays". The first image given below shows a Long variable called "numRows" being assigned to this function call on "myDem". When this assigment is made -- and, therefore, the "NumberOfGuaranteeDays" function is called -- I receive an Overflow error.
After the function call "NumberOfGuaranteeDays" is run, which is a fairly complex call with many sub-functions itself, then class attempts to terminate itself, and return the value (the number of days, which is an integer... in this case it's 32561). It is during this termination step and the assignment of "numRows" to the value 32561, when the error occurs.
Here I simply demonstrate that the very next step within the code, if I step through it, is where the error message is returned back to me.
Finally I wanted to provide "proof" that the value assigned to "numRows" is 32561, of type Integer, which can acceptably be assigned to a Long. Note in the far right in the watch part of the window, the value "res" is the value which is returned from the "NumberOfGuaranteeDays" call, which is then assigned to the variable "numRows".
As far as I can tell, there are only 2 possibilities for why a crash can be occurring:
There is an error in the attempt to terminate the class. I don't understand how this could lead to an "overflow" error, though.
There is an error in the assignment of the value calculated from "NumberOfGuaranteeDays" to the variable "numRows". This sort of assignment could potentially have an overflow, but not in this case. The return from "NumberOfGuaranteeDays", which is the "res" integer set at 32561, is assigned to "numRows", which is a Long.
So, since neither of these possibilities that I can imagine make sense, I figure there must be another possibility I cannot see. Thank you all in advance for the help!
I opted to put in cut & pasted images instead of writing as code because so much of what's critical to understanding the steps is seeing "proof" of where I am in the debugging stages. If actual code snippets would help, let me know.
Thank you!
Mike

naming a function that exhibits "set if not equal" behavior

This might be an odd question, but I'm looking for a word to use in a function name. I'm normally good at coming up with succinct, meaningful function names, but this one has me stumped so I thought I'd appeal for help.
The function will take some desired state as an argument and compare it to the current state. If no change is needed, the function will exit normally without doing anything. Otherwise, the function will take some action to achieve the desired state.
For example, if wanted to make sure the front door was closed, i might say:
my_house.<something>_front_door('closed')
What word or term should use in place of the something? I'd like it to be short, readable, and minimize the astonishment factor.
A couple clarifying points...
I would want someone calling the function to intuitively know they didn't need to wrap the function an 'if' that checks the current state. For example, this would be bad:
if my_house.front_door_is_open():
my_house.<something>_front_door('closed')
Also, they should know that the function won't throw an exception if the desired state matches the current state. So this should never happen:
try:
my_house.<something>_front_door('closed')
except DoorWasAlreadyClosedException:
pass
Here are some options I've considered:
my_house.set_front_door('closed')
my_house.setne_front_door('closed') # ne=not equal, from the setne x86 instruction
my_house.ensure_front_door('closed')
my_house.configure_front_door('closed')
my_house.update_front_door('closed')
my_house.make_front_door('closed')
my_house.remediate_front_door('closed')
And I'm open to other forms, but most I've thought of don't improve readability. Such as...
my_house.ensure_front_door_is('closed')
my_house.conditionally_update_front_door('closed')
my_house.change_front_door_if_needed('closed')
Thanks for any input!
I would use "ensure" as its succinct, descriptive and to the point:
EnsureCustomerExists(CustomerID)
EnsureDoorState(DoorStates.Closed)
EnsureUserInterface(GUIStates.Disabled)
Interesting question!
From the info that you have supplied, it seems to me that setstate (or simply set, if you are setting other things than states) would be fine, though ensure is good if you want to really emphasize the redundancy of an if.
To me it is however perfectly intuitive that setting a state does not throw an exception, or require an if. Think of setting the state of any other variable:
In C:
int i;
i = 5; // Would you expect this to throw an exception if i was already 5?
// Would you write
if (i != 5)
i = 5;
// ?
Also it only takes about one sentence to document this behaviour:
The function does nothing if the
current state equals the requested
state.
EDIT: Actually, thinking about it, if it is really important to you (for some reason) that the user is not confused about this, I would in fact pick ensure (or some other non-standard name). Why? Because as a user, a name like that would make me scratch my head a bit and look up the documentation ("This is more than just an ordinary set-function, apparently").
EDIT 2: Only you know how you design your programs, and which function name fits in best. From what you are saying, it seems like your setting functions sometimes throw exceptions, and you need to name a setting function that doesn't - e.g. set_missile_target. If that is the case, I think you should consider the set_if, set_when, set_cond or cond_set names. Which one would kind of depend on the rest of your code. I would also add that one line of documentation (or two, if you're generous), which clarifies the whole thing.
For example:
// Sets missile target if current target is not already the requested target,
// in which case it does nothing. No exceptions are thrown.
function cond_set_missile_target ()
or function cond_set_MissileTarget ()
or function condSet_MissileTarget ()
or function condSetMissileTarget ()
ensure is not so bad, but to me it implies only that there is additional logic required to set the state (e.g. multiple states tied together, or other complications). It helps to make the user avoid adding unnecessary ifs, but it does not help much with the exception issue. I would expect an ensure function to throw an exception sooner than a set function, since the ensure function clearly has more responsibilities for, well, ensuring that this setting operation is in fact done right.
I'd go for ensure for the function you describe. I'd also use camelCase, but I suppose you may be in a language that prefers underscores.
You could always document (shock!) your API so that others don't make the mistakes you describe.