If Statement Optimization - comparing character strings vs constant boolean flags

If Statement Optimization - comparing character strings vs constant boolean flags - optimization

Consider the following Java code:
public void DoStuff(String[] strings, boolean preEval)
{
final String compareTo = "A Somewhat Long String of Characters";
for ( int i = 0; i < strings.length; ++i )
{
if ( preEval )
{
if( strings[i].equals(compareTo) )
{
//do something process intensive
}
}
//do something process intensive
}
}
Now pay attention to if (preEval) and the inner statement within that. If the algorithm in use requires a condition such as preEval, does it make sense to include the preEval condition for the purposes of code optimization?
From my understanding, evaluating to see if a conditional flag resolves to true or false is much faster than iterating through a collection of characters and comparing each character within that collection with another corresponding character from a different collection.
My knowledge of assembly is about 30% I'd say in terms of the internals and opcodes/mnemonics involved, hence why I'm asking this question.
Update
Note: the code posted here is meant to be language independent; I simply chose Java just for the sake of something tangible and easy to read, as well as something which is widely known among the programmer community.

I would say that this would probably be an optimization in most cases.
That said, you should not spend time on optimizing code that has not been measured.
This might for example not be a worthwhile optimization if:
most of your cases involves few strings or very short strings.
it takes a long time to calculate the preEval parameter before calling the function.
Measure your code under realistic circumstances, identify your bottle necks, then you optimize.

A less costly approach might be to use a HashSet::contains(string) method to check for existence of a string in a collection. You can probably design away the need for string compares while iterating using a HashSet of strings or a HashMap keyed by String.
I always try to use a HashMap where i can to avoid conditional logic entirely.
_ryan

Related

Is it acceptable to use `to` to create a `Pair`?

to is an infix function within the standard library. It can be used to create Pairs concisely:
0 to "hero"
in comparison with:
Pair(0, "hero")
Typically, it is used to initialize Maps concisely:
mapOf(0 to "hero", 1 to "one", 2 to "two")
However, there are other situations in which one needs to create a Pair. For instance:
"to be or not" to "be"
(0..10).map { it to it * it }
Is it acceptable, stylistically, to (ab)use to in this manner?

Just because some language features are provided does not mean they are better over certain things. A Pair can be used instead of to and vice versa. What becomes a real issue is that, does your code still remain simple, would it require some reader to read the previous story to understand the current one? In your last map example, it does not give a hint of what it's doing. Imagine someone reading { it to it * it}, they would be most likely confused. I would say this is an abuse.
to infix offer a nice syntactical sugar, IMHO it should be used in conjunction with a nicely named variable that tells the reader what this something to something is. For example:
val heroPair = Ironman to Spiderman //including a 'pair' in the variable name tells the story what 'to' is doing.
Or you could use scoping functions
(Ironman to Spiderman).let { heroPair -> }

I don't think there's an authoritative answer to this.  The only examples in the Kotlin docs are for creating simple constant maps with mapOf(), but there's no hint that to shouldn't be used elsewhere.
So it'll come down to a matter of personal taste…
For me, I'd be happy to use it anywhere it represents a mapping of some kind, so in a map{…} expression would seem clear to me, just as much as in a mapOf(…) list.  Though (as mentioned elsewhere) it's not often used in complex expressions, so I might use parentheses to keep the precedence clear, and/or simplify the expression so they're not needed.
Where it doesn't indicate a mapping, I'd be much more hesitant to use it.  For example, if you have a method that returns two values, it'd probably be clearer to use an explicit Pair.  (Though in that case, it'd be clearer still to define a simple data class for the return value.)

You asked for personal perspective so here is mine.
I found this syntax is a huge win for simple code, especial in reading code. Reading code with parenthesis, a lot of them, caused mental stress, imagine you have to review/read thousand lines of code a day ;(

keyword - FFL: Where vs. Let

I was trying to understand the following code:
def() ->commands
if(deferred_passive_abilities != [],
let [{ability: class passive_ability, creature: class creature}] items = [];
let found = false;
map(deferred_passive_abilities,
if(cmd = null, add(items, [value]), [cmd, set(found, true)])
where cmd = value.ability.static_effect(me, value.creature));
if(found,
set(deferred_passive_abilities, items);
evaluate_deferred_passive_abilities(),
set(deferred_passive_abilities, []))
)
Haskell appears to have both let and where, but I didn't learn much by a superficial reading of their haskell docs. They also have a let...in, which I didn't understand but it would be good to know if FFL has that.
So, what is the significance of using let versus where? Was it necessary to use let here? (Also, possibly another question: why does it need those semicolons?)

Using let introduces a variable that can be modified. Note how found and items are modified. By contrast, where always introduces immutable symbols.
Semi-colons are used in FFL to create a command pipeline. Normally in FFL, an entire formula is evaluated, resulting in a command or list of commands, and then the commands are executed.
When a semi-colon is present, everything before the semi-colon is treated as an entirely separate formula to everything after the semi-colon. The first formula is evaluated and executed and then the second formula is evaluated and executed.
Semi-colons effectively allow a much more procedural programming style in FFL, without semi-colons it is a purely functional language.

Never knew of let in FFL before this, must be very rare.
Regardless of the insights, the semicolon has to be absolutely necessary, in order to force execution before using the bound variable. In other words, until used the semicolon, the variable does not exist. Does not have a bound value.
This is a big difference to where, which doesn't need of semicolons.
Given the semicolon is not a construction for complete beginners, I could somewhat recommend beginners about variables to stick in where until understanding the trickery of the semicolons.

SonarLint - questions about some of the rules for VB.NET

The large majority of SonarLint rules that I've come across in Java seemed plausible and justified. However, ever since I've started using SonarLint for VB.NET, I've come across several rules that left me questioning their usefulness or even whether or not they are working correctly.
I'd like to know if this is simply a problem of me using some VB.NET constructs in a suboptimal way or whether the rule really is flawed.
(Apologies if this question is a little longer. I didn't know if I should create a separate question for each individual rule.)
The following rules I found to leave some cases unconsidered that would actually turn up as false-positives:
S1871: Two branches in the same conditional structure should not have exactly the same implementation
I found this one to bring up a lot of false-positives for me, because sometimes the order in which the conditions are checked actually does matter. Take the following pseudo code as example:
If conditionA() Then
doSomething()
ElseIf conditionB() AndAlso conditionC() Then
doSomethingElse()
ElseIf conditionD() OrElse conditionE() Then
doYetAnotherThing()
'... feel free to have even more cases in between here
Else Then
doSomething() 'Non-compliant
End If
If I wanted to follow this Sonar rule and still make the code behave the same way, I'd have to add the negated version of each ElseIf-condition to the first If-condition.
Another example would be the following switch:
Select Case i
Case 0 To 40
value = 0
Case 41 To 60
value = 1
Case 61 To 80
value = 3
Case 81 To 100
value = 5
Case Else
value = 0 'Non-compliant
There shouldn't be anything wrong with having that last case in a switch. True, I could have initialized value beforehand to 0 and ignored that last case, but then I'd have one more assignment operation than necessary. And the Java ruleset has conditioned me to always put a default case in every switch.
S1764: Identical expressions should not be used on both sides of a binary operator
This rule does not seem to take into account that some functions may return different values every time you call them, for instance collections where accessing an element removes it from the collection:
stack.Push(stack.Pop() / stack.Pop()) 'Non-compliant
I understand if this is too much of an edge case to make special exceptions for it, though.
The following rules I am not actually sure about:
S3385: "Exit" statements should not be used
While I agree that Return is more readable than Exit Sub, is it really bad to use a single Exit For to break out of a For or a For Each loop? The SonarLint rule for Java permits the use of a single break; in a loop before flagging it as an issue. Is there a reason why the default in VB.NET is more strict in that regard? Or is the rule built on the assumption that you can solve nearly all your loop problems with LINQ extension methods and lambdas?
S2374: Signed types should be preferred to unsigned ones
This rule basically states that unsigned types should not be used at all because they "have different arithmetic operators than signed ones - operators that few developers understand". In my code I am only using UInteger for ID values (because I don't need negative values and a Long would be a waste of memory in my case). They are stored in List(Of UInteger) and only ever compared to other UIntegers. Is this rule even relevant to my case (are comparisons part of these "arithmetic operators" mentioned by the rule) and what exactly would be the pitfall? And if not, wouldn't it be better to make that rule apply to arithmetic operations involving unsigned types, rather than their declaration?
S2355: Array literals should be used instead of array creation expressions
Maybe I don't know VB.NET well enough, but how exactly would I satisfy this rule in the following case where I want to create a fixed-size array where the initialization length is only known at runtime? Is this a false-positive?
Dim myObjects As Object() = New Object(someOtherList.Count - 3) {} 'Non-compliant
Sure, I could probably just use a List(Of Object). But I am curious anyway.

Thanks for raising these points. Note that not all rules apply every time. There are cases when we need to balance between false positives/false negatives/real cases. For example with identical expressions on both sides of an operator rule. Is it a bug to have the same operands? No it's not. If it was, then the compiler would report it. Is it a bad smell, is it usually a mistake? Yes in many cases. See this for example in Roslyn. Should we tune this rule to exclude some cases? Yes we should, there's nothing wrong with 2 << 2. So there's a lot of balancing that needs to happen, and we try to settle for an implementation that brings the most value for the users.
For the points you raised:
Two branches in the same conditional structure should not have exactly the same implementation
This rule generally states that having two blocks of code match exactly is a bad sign. Copy-pasted code should be avoided for many reasons, for example if you need to fix the code in one place, you'll need to fix it in the other too. You're right that adding negated conditions would be a mess, but if you extract each condition into its own method (and call the negated methods inside them) with proper names, then it would probably improves the readability of your code.
For the Select Case, again, copy pasted code is always a bad sign. In this case you could do this:
Select Case i
...
Case 0 To 40
Case Else
value = 0 ' Compliant
End Select
Or simply remove the 0-40 case.
Identical expressions should not be used on both sides of a binary operator
I think this is a corner case. See the first paragraph of the answer.
"Exit" statements should not be used
It's almost always true that by choosing another type of loop, or changing the stop condition, you can get away without using any "Exit" statements. It's good practice to have a single exit point from loops.
Signed types should be preferred to unsigned ones
This is a legacy rule from SonarQube VB.NET, and I agree with you that it shouldn't be enabled by default in SonarLint. I created the following ticket in our JIRA: https://jira.sonarsource.com/browse/SLVS-1074
Array literals should be used instead of array creation expressions
Yes, it seems to be a false positive, we shouldn't report on array creations when the size is explicitly specified. https://jira.sonarsource.com/browse/SLVS-1075

Generating Random String of Numbers and Letters Using Go's "testing/quick" Package

I've been breaking my head over this for a few days now and can't seem to be able to figure it out. Perhaps it's glaringly obvious, but I don't seem to be able to spot it. I've read up on all the basics of unicode, UTF-8, UTF-16, normalisation, etc, but to no avail. Hopefully somebody's able to help me out here...
I'm using Go's Value function from the testing/quick package to generate random values for the fields in my data structs, in order to implement the Generator interface for the structs in question. Specifically, given a Metadata struct, I've defined the implementation as follows:
func (m *Metadata) Generate(r *rand.Rand, size int) (value reflect.Value) {
value = reflect.ValueOf(m).Elem()
for i := 0; i < value.NumField(); i++ {
if t, ok := quick.Value(value.Field(i).Type(), r); ok {
value.Field(i).Set(t)
}
}
return
}
Now, in doing so, I'll end up with both the receiver and the return value being set with random generated values of the appropriate type (strings, ints, etc. in the receiver and reflect.Value in the returned reflect.Value).
Now, the implementation for the Value function states that it will return something of type []rune converted to type string. As far as I know, this should allow me to then use the functions in the runes, unicode and norm packages to define a filter which filters out everything which is not part of 'Latin', 'Letter' or 'Number'. I defined the following filter which uses a transform to filter out letters which are not in those character rangetables (as defined in the unicode package):
func runefilter(in reflect.Value) (out reflect.Value) {
out = in // Make sure you return something
if in.Kind() == reflect.String {
instr := in.String()
t := transform.Chain(norm.NFD, runes.Remove(runes.NotIn(rangetable.Merge(unicode.Letter, unicode.Latin, unicode.Number))), norm.NFC)
outstr, _, _ := transform.String(t, instr)
out = reflect.ValueOf(outstr)
}
return
}
Now, I think I've tried just about anything, but I keep ending up with a series of strings which are far from the Latin range, e.g.:
𥗉똿穊
𢷽嚶
秓䝏小𪖹䮋
𪿝ท솲
𡉪䂾
ʋ𥅮ᦸ
堮𡹯憨𥗼𧵕ꥆ
𢝌𐑮𧍛併怃𥊇
鯮
𣏲𝐒
⓿ꐠ槹𬠂黟
𢼭踁퓺𪇖
俇𣄃𔘧
𢝶
𝖸쩈𤫐𢬿詢𬄙
𫱘𨆟𑊙
欓
So, can anybody explain what I'm overlooking here and how I could instead define a transformer which removes/replaces non-letter/number/latin characters so that I can use the Value function as intended (but with a smaller subset of 'random' characters)?
Thanks!

Confusingly the Generate interface needs a function using the type not a the pointer to the type. You want your type signature to look like
func (m Metadata) Generate(r *rand.Rand, size int) (value reflect.Value)
You can play with this here. Note: the most important thing to do in that playground is to switch the type of the generate function from m Metadata to m *Metadata and see that Hi Mom! never prints.
In addition, I think you would be better served using your own type and writing a generate method for that type using a list of all of the characters you want to use. For example:
type LatinString string
const latin = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01233456789"
and then use the generator
func (l LatinString) Generate(rand *rand.Rand, size int) reflect.Value {
var buffer bytes.Buffer
for i := 0; i < size; i++ {
buffer.WriteString(string(latin[rand.Intn(len(latin))]))
}
s := LatinString(buffer.String())
return reflect.ValueOf(s)
}
playground
Edit: also this library is pretty cool, thanks for showing it to me

The answer to my own question is, it seems, a combination of the answers provided in the comments by #nj_ and #jimb and the answer provided by #benjaminkadish.
In short, the answer boils down to:
"Not such a great idea as you thought it was", or "Bit of an ill-posed question"
"You were using the union of 'Letter', 'Latin' and 'Number' (Letter || Number || Latin), instead of the intersection of 'Latin' with the union of 'Letter' and 'Number' ((Letter || Number) && Latin))
Now for the longer version...
The idea behind me using the testing/quick package is that I wanted random data for (fuzzy) testing of my code. In the past, I've always written the code for doing things like that myself, again and again. This meant a lot of the same code across different projects. Now, I could of course written my own package for it, but it turns out that, even better than that, there's actually a standard package which does just about exactly what I want.
Now, it turns out the package does exactly what I want very well. The codepoints in the strings which it generates are actually random and not just restricted to what we're accustomed to using in everyday life. Now, this is of course exactly the thing which you want in doing fuzzy testing in order to test the code with values outside the usual assumptions.
In practice, that means I'm running into two problems:
There's some limits on what I would consider reasonable input for a string. Meaning that, in testing the processing of a Name field or a URL field, I can reasonably assume there's not going to be a value like 'James Mc⌢' (let alone 'James Mc🙁') or 'www.🕸site.com', but just 'James McFrown' and 'www.website.com'. Hence, I can't expect a reasonable system to be able to support it. Of course, things shouldn't completely break down, but it also can't be expected to handle the former examples without any problems.
When I filter the generated string on values which one might consider reasonable, the chance of ending up with a valid string is very small. The set of possible characters in the set used by the testing/quick is just so large (0x10FFFF) and the set of reasonable characters so small, you end up with empty strings most of the time.
So, what do we need to take away from this?
So, whilst I hoped to use the standard testing/quick package to replace my often repeated code to generate random data for fuzzy testing, it does this so well that it provides data outside the range of what I would consider reasonable for the code to be able to handle. It seems that the choice, in the end, is to:
Either be able to actually handle all fuzzy options, meaning that if somebody's name is 'Arnold 💰💰' ('Arnold Moneybags'), it shouldn't go arse over end. Or...
Use custom/derived types with their own Generator. This means you're going to have to use the derived type instead of the basic type throughout the code. (Comparable to defining a string as wchar_t instead of char in C++ and working with those by default.). Or...
Don't use testing/quick for fuzzy testing, because as soon as you run into a generated string value, you can (and should) get a very random string.
As always, further comments are of course welcome, as it's quite possible I overlooked something.

What is your preferred boolean pair: 1/0 Yes/No True/False?

When dealing with MySQL, I typically use the BOOLEAN type, which is equivalent to TINYINT(1), or 1/0
In most languages I work with, true/false is preferred
When displaying forms, sometimes "Yes / No" makes more sense

enum Bool
{
True,
False,
FileNotFound
};
http://thedailywtf.com/Articles/What_Is_Truth_0x3f_.aspx

In code: true/false.
In the UI: Yes/No or OK/Cancel

true and false makes a lot more sense to me, in code - partly through familiarity, I'm sure. I suspect I'd get used to yes and no pretty quickly. 1 and 0 really doesn't work for me though.
Consider the expression
age == 5
It's a test for truth-hood. Is the value of age 5? Yes, it's true. Both "yes" and "no" would be fine for me, but the idea that the answer to the question "Is the value of age 5?" is "1" seems pretty counter-intuitive to me. Just because that's the typical binary representation of truth-hood doesn't mean it's a useful one at a higher abstraction.

Which is easier to read?
while(true) {}
while(yes) {}
while(1) {}
I'll stick with true for most cases.

Here are rules I live by...
Rule #1
Use well defined constants in the programming languages that you use to communicate with the CPU, i.e., true/false in most modern cases for boolean values. If a database offers a boolean type or some such equivalent, of course it should be used.
Rule #2
Interact with users of your software by using their preferred language and idiom, i.e., Yes/No questions should offer Yes/No (or perhaps an alternative to No, such as Cancel).
Rule #3
Uncertainty should be expressed in terms of scope, i.e., "that depends", which will be followed up by the question "on what?". I know developers who answer the question by copying and pasting just about every dependency they may need into every code file of a project as a 'using' statement. That's just sloppy, and please bother to alphabetize or at least group namespaces together.
When a bool Just Isn't Enough
Incidentally, an interesting twist to this, available in C#, is Nullable;
The you can write
Nullable<bool> RespondToIritatingQuestion()
{
return new Nullable<bool>();
}
OR
bool? RespondToIritatingQuestionWithSytle()
{
return new bool?();
}
and the questioner would need to evaluate your response before even knowing what the answer, if there is one, might be...
bool? answer = RespondToIritatingQuestionWithStyle();
if (answer.HasValue)
Trace.WriteLine("The bloke responded with " + answer.Value.ToString());
else
Trace.WriteLine("The bloke responded with 'depends'.");

1 or 0 for SQL. SQL has a boolean type for a reason. Also, in very large databases, it can effect performance.

I use booleans for true/false fields in databases. Some people use ENUM('true', 'false'), but thats not my preference. For programming languages, I always use true/false, even if setting it to 0 or 1 will work. And if the form requires 'yes'/'no', I still use booleans to represent the values, but display them as strings that are more logical.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas